2013-06-17 17:52:11

by Borislav Petkov

[permalink] [raw]
Subject: [PATCH -v2 0/4] EFI 1:1 mapping

From: Borislav Petkov <[email protected]>

Hi all,

this is just a snapshot of the current state of affairs. The patchset
starts to boot successfully on real hardware now but we're far away from
the coverage we'd like to have before we even consider upstreaming it.

And yes, considering the sick f*ck EFI is, we're keeping the 1:1 mapping
optional and off by default (you need to boot with "efi=1:1_map" to
enable it).

Matt has picked up 1/4 already so I'll drop it when it lands into -tip
and so on...

Thanks for any suggestions, as always.

Borislav Petkov (4):
efi: Convert runtime services function ptrs
x86, cpa: Map in an arbitrary pgd
x86, efi: Add an efi= kernel command line parameter
x86, efi: Map runtime services 1:1

arch/x86/boot/compressed/eboot.c | 2 +-
arch/x86/include/asm/efi.h | 81 ++++++++++-----
arch/x86/include/asm/pgtable_types.h | 3 +-
arch/x86/mm/pageattr.c | 82 ++++++++++++----
arch/x86/platform/efi/efi.c | 184 +++++++++++++++++++++++++++++------
arch/x86/platform/efi/efi_stub_64.S | 56 +++++++++++
include/linux/efi.h | 28 +++---
7 files changed, 348 insertions(+), 88 deletions(-)

--
1.8.3


2013-06-17 17:51:03

by Borislav Petkov

[permalink] [raw]
Subject: [PATCH -v2 4/4] x86, efi: Map runtime services 1:1

From: Borislav Petkov <[email protected]>

Due to the braindead design of EFI, we cannot map runtime services more
than once for the duration of a booted system. Thus, if we want to use
EFI runtime services in a kexec'ed kernel, maybe the only possible and
sensible approach would be to map them 1:1 so that when the kexec kernel
loads, it can simply call those addresses without the need for remapping
(which doesn't work anyway).

Furthermore, this mapping approach could be of help with b0rked EFI
implementations for a different set of reasons.

This implementation is 64-bit only for now.

Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/efi.h | 67 +++++++++++----
arch/x86/platform/efi/efi.c | 165 +++++++++++++++++++++++++++++-------
arch/x86/platform/efi/efi_stub_64.S | 56 ++++++++++++
3 files changed, 240 insertions(+), 48 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 5b33686b6995..3adeef4a0064 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -39,8 +39,13 @@ extern unsigned long asmlinkage efi_call_phys(void *, ...);

#else /* !CONFIG_X86_32 */

+#include <linux/sched.h>
+
#define EFI_LOADER_SIGNATURE "EL64"

+extern pgd_t *efi_pgt;
+extern bool efi_use_11_map;
+
extern u64 efi_call0(void *fp);
extern u64 efi_call1(void *fp, u64 arg1);
extern u64 efi_call2(void *fp, u64 arg1, u64 arg2);
@@ -51,6 +56,22 @@ extern u64 efi_call5(void *fp, u64 arg1, u64 arg2, u64 arg3,
extern u64 efi_call6(void *fp, u64 arg1, u64 arg2, u64 arg3,
u64 arg4, u64 arg5, u64 arg6);

+/*
+ * map-in low kernel mapping for passing arguments to EFI functions.
+ */
+static inline void efi_sync_low_kernel_mappings(void)
+{
+ unsigned num_pgds;
+ pgd_t *pgd;
+
+ pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
+ num_pgds = pgd_index(VMALLOC_START - 1) - pgd_index(PAGE_OFFSET);
+
+ memcpy(pgd + pgd_index(PAGE_OFFSET),
+ init_mm.pgd + pgd_index(PAGE_OFFSET),
+ sizeof(pgd_t) * num_pgds);
+}
+
#define efi_call_phys0(f) \
efi_call0((f))
#define efi_call_phys1(f, a1) \
@@ -69,24 +90,36 @@ extern u64 efi_call6(void *fp, u64 arg1, u64 arg2, u64 arg3,
efi_call6((f), (u64)(a1), (u64)(a2), (u64)(a3), \
(u64)(a4), (u64)(a5), (u64)(a6))

+#define _efi_call_virtX(x, f, ...) \
+({ \
+ efi_status_t __s; \
+ \
+ if (efi_use_11_map) { \
+ efi_sync_low_kernel_mappings(); \
+ preempt_disable(); \
+ } \
+ \
+ __s = efi_call##x(efi.systab->runtime->f, __VA_ARGS__); \
+ \
+ if (efi_use_11_map) \
+ preempt_enable(); \
+ __s; \
+})
+
#define efi_call_virt0(f) \
- efi_call0((efi.systab->runtime->f))
-#define efi_call_virt1(f, a1) \
- efi_call1((efi.systab->runtime->f), (u64)(a1))
-#define efi_call_virt2(f, a1, a2) \
- efi_call2((efi.systab->runtime->f), (u64)(a1), (u64)(a2))
-#define efi_call_virt3(f, a1, a2, a3) \
- efi_call3((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
- (u64)(a3))
-#define efi_call_virt4(f, a1, a2, a3, a4) \
- efi_call4((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
- (u64)(a3), (u64)(a4))
-#define efi_call_virt5(f, a1, a2, a3, a4, a5) \
- efi_call5((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
- (u64)(a3), (u64)(a4), (u64)(a5))
-#define efi_call_virt6(f, a1, a2, a3, a4, a5, a6) \
- efi_call6((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
- (u64)(a3), (u64)(a4), (u64)(a5), (u64)(a6))
+ _efi_call_virtX(0, f)
+#define efi_call_virt1(f, a1) \
+ _efi_call_virtX(1, f, (u64)(a1))
+#define efi_call_virt2(f, a1, a2) \
+ _efi_call_virtX(2, f, (u64)(a1), (u64)(a2))
+#define efi_call_virt3(f, a1, a2, a3) \
+ _efi_call_virtX(3, f, (u64)(a1), (u64)(a2), (u64)(a3))
+#define efi_call_virt4(f, a1, a2, a3, a4) \
+ _efi_call_virtX(4, f, (u64)(a1), (u64)(a2), (u64)(a3), (u64)(a4))
+#define efi_call_virt5(f, a1, a2, a3, a4, a5) \
+ _efi_call_virtX(5, f, (u64)(a1), (u64)(a2), (u64)(a3), (u64)(a4), (u64)(a5))
+#define efi_call_virt6(f, a1, a2, a3, a4, a5, a6) \
+ _efi_call_virtX(6, f, (u64)(a1), (u64)(a2), (u64)(a3), (u64)(a4), (u64)(a5), (u64)(a6))

extern void __iomem *efi_ioremap(unsigned long addr, unsigned long size,
u32 type, u64 attribute);
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 5af5b97bf203..5409f1ccc9e3 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -94,6 +94,17 @@ unsigned long x86_efi_facility;
static unsigned long efi_config;

/*
+ * Scratch space for 1:1 mapping
+ */
+struct efi_scratch {
+ u64 r15;
+ u64 prev_cr3;
+ pgd_t *pgt11;
+};
+
+extern struct efi_scratch efi_scratch;
+
+/*
* Returns 1 if 'facility' is enabled, 0 otherwise.
*/
int efi_enabled(int facility)
@@ -763,6 +774,25 @@ static int __init efi_runtime_init(void)
* virtual mode.
*/
efi.get_time = phys_efi_get_time;
+
+ if (efi_config & EFI_CFG_MAP11) {
+#define efi_phys_assign(f) \
+ efi_phys.f = (efi_ ##f## _t *)runtime->f
+
+ efi_phys_assign(set_time);
+ efi_phys_assign(get_wakeup_time);
+ efi_phys_assign(set_wakeup_time);
+ efi_phys_assign(get_variable);
+ efi_phys_assign(get_next_variable);
+ efi_phys_assign(set_variable);
+ efi_phys_assign(get_next_high_mono_count);
+ efi_phys_assign(reset_system);
+ efi_phys_assign(set_virtual_address_map);
+ efi_phys_assign(query_variable_info);
+ efi_phys_assign(update_capsule);
+ efi_phys_assign(query_capsule_caps);
+ }
+
early_iounmap(runtime, sizeof(efi_runtime_services_t));

return 0;
@@ -954,6 +984,65 @@ void efi_memory_uc(u64 addr, unsigned long size)
set_memory_uc(addr, npages);
}

+static void __init __runtime_map_11(efi_memory_desc_t *md)
+{
+ unsigned long page_flags = 0;
+ pgd_t *pgd = NULL;
+
+#ifdef CONFIG_X86_64
+ pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
+#endif
+
+ if (md->type == EFI_RUNTIME_SERVICES_DATA ||
+ md->type == EFI_BOOT_SERVICES_DATA)
+ page_flags |= _PAGE_NX;
+
+ if (!(md->attribute & EFI_MEMORY_WB))
+ page_flags |= _PAGE_PCD;
+
+ kernel_map_pages_in_pgd(pgd + pgd_index(md->phys_addr),
+ md->phys_addr,
+ md->num_pages,
+ page_flags);
+
+ md->virt_addr = md->phys_addr;
+}
+
+static int __init __runtime_ioremap(efi_memory_desc_t *md)
+{
+ u64 end, systab, start_pfn, end_pfn;
+ unsigned long size;
+ void *va;
+
+ size = md->num_pages << EFI_PAGE_SHIFT;
+ end = md->phys_addr + size;
+ start_pfn = PFN_DOWN(md->phys_addr);
+ end_pfn = PFN_UP(end);
+
+ if (pfn_range_is_mapped(start_pfn, end_pfn)) {
+ va = __va(md->phys_addr);
+
+ if (!(md->attribute & EFI_MEMORY_WB))
+ efi_memory_uc((u64)(unsigned long)va, size);
+ } else
+ va = efi_ioremap(md->phys_addr, size, md->type, md->attribute);
+
+ md->virt_addr = (u64) (unsigned long) va;
+ if (!va) {
+ pr_err("ioremap of 0x%llX failed!\n",
+ (unsigned long long)md->phys_addr);
+ return 1;
+ }
+
+ systab = (u64) (unsigned long) efi_phys.systab;
+ if (md->phys_addr <= systab && systab < end) {
+ systab += md->virt_addr - md->phys_addr;
+ efi.systab = (efi_system_table_t *) (unsigned long) systab;
+ }
+
+ return 0;
+}
+
/*
* This function will switch the EFI runtime services to virtual mode.
* Essentially, look through the EFI memmap and map every region that
@@ -966,9 +1055,7 @@ void __init efi_enter_virtual_mode(void)
{
efi_memory_desc_t *md, *prev_md = NULL;
efi_status_t status;
- unsigned long size;
- u64 end, systab, start_pfn, end_pfn;
- void *p, *va, *new_memmap = NULL;
+ void *p, *new_memmap = NULL;
int count = 0;

efi.systab = NULL;
@@ -1017,33 +1104,18 @@ void __init efi_enter_virtual_mode(void)
md->type != EFI_BOOT_SERVICES_DATA)
continue;

- size = md->num_pages << EFI_PAGE_SHIFT;
- end = md->phys_addr + size;
-
- start_pfn = PFN_DOWN(md->phys_addr);
- end_pfn = PFN_UP(end);
- if (pfn_range_is_mapped(start_pfn, end_pfn)) {
- va = __va(md->phys_addr);
-
- if (!(md->attribute & EFI_MEMORY_WB))
- efi_memory_uc((u64)(unsigned long)va, size);
- } else
- va = efi_ioremap(md->phys_addr, size,
- md->type, md->attribute);
-
- md->virt_addr = (u64) (unsigned long) va;
-
- if (!va) {
- pr_err("ioremap of 0x%llX failed!\n",
- (unsigned long long)md->phys_addr);
+ /*
+ * XXX: need to map the region which contains
+ * SetVirtualAddressMap so that we can call it here.
+ * Probably can be removed after we map boot services 1:1
+ * too.
+ */
+ if (__runtime_ioremap(md))
continue;
- }

- systab = (u64) (unsigned long) efi_phys.systab;
- if (md->phys_addr <= systab && systab < end) {
- systab += md->virt_addr - md->phys_addr;
- efi.systab = (efi_system_table_t *) (unsigned long) systab;
- }
+ if (efi_config & EFI_CFG_MAP11)
+ __runtime_map_11(md);
+
new_memmap = krealloc(new_memmap,
(count + 1) * memmap.desc_size,
GFP_KERNEL);
@@ -1052,7 +1124,8 @@ void __init efi_enter_virtual_mode(void)
count++;
}

- BUG_ON(!efi.systab);
+ if (!(efi_config & EFI_CFG_MAP11))
+ BUG_ON(!efi.systab);

status = phys_efi_set_virtual_address_map(
memmap.desc_size * count,
@@ -1072,6 +1145,34 @@ void __init efi_enter_virtual_mode(void)
*
* Call EFI services through wrapper functions.
*/
+ if (efi_config & EFI_CFG_MAP11) {
+#define efi_assign(efi, f) efi.systab->runtime->f = efi_phys.f
+
+ efi.systab->runtime = kzalloc(sizeof(efi_runtime_services_t),
+ GFP_KERNEL);
+ BUG_ON(!efi.systab->runtime);
+
+ efi_assign(efi, get_time);
+ efi_assign(efi, set_time);
+ efi_assign(efi, get_wakeup_time);
+ efi_assign(efi, set_wakeup_time);
+ efi_assign(efi, get_variable);
+ efi_assign(efi, get_next_variable);
+ efi_assign(efi, set_variable);
+ efi_assign(efi, get_next_high_mono_count);
+ efi_assign(efi, reset_system);
+ efi_assign(efi, query_variable_info);
+ efi_assign(efi, update_capsule);
+ efi_assign(efi, query_capsule_caps);
+
+#ifdef CONFIG_X86_64
+ efi_scratch.pgt11 = (pgd_t *)(unsigned long)real_mode_header->trampoline_pgd;
+#endif
+ efi_use_11_map = true;
+
+ pr_info("Using 1:1 map.\n");
+ }
+
efi.runtime_version = efi_systab.hdr.revision;
efi.get_time = virt_efi_get_time;
efi.set_time = virt_efi_set_time;
@@ -1086,8 +1187,10 @@ void __init efi_enter_virtual_mode(void)
efi.query_variable_info = virt_efi_query_variable_info;
efi.update_capsule = virt_efi_update_capsule;
efi.query_capsule_caps = virt_efi_query_capsule_caps;
- if (__supported_pte_mask & _PAGE_NX)
- runtime_code_page_mkexec();
+
+ if (!(efi_config & EFI_CFG_MAP11))
+ if (__supported_pte_mask & _PAGE_NX)
+ runtime_code_page_mkexec();

kfree(new_memmap);
}
diff --git a/arch/x86/platform/efi/efi_stub_64.S b/arch/x86/platform/efi/efi_stub_64.S
index 4c07ccab8146..a5f1ef4fb14a 100644
--- a/arch/x86/platform/efi/efi_stub_64.S
+++ b/arch/x86/platform/efi/efi_stub_64.S
@@ -34,10 +34,47 @@
mov %rsi, %cr0; \
mov (%rsp), %rsp

+/* stolen from gcc */
+ .macro FLUSH_TLB_ALL
+ movq %r15, efi_scratch
+ movq %r14, efi_scratch+8
+ movq %cr4, %r15
+ movq %r15, %r14
+ andb $0x7f, %r14b
+ movq %r14, %cr4
+ movq %r15, %cr4
+ movq efi_scratch+8, %r14
+ movq efi_scratch, %r15
+ .endm
+
+ .macro SWITCH_PGT
+ cmpb $0, efi_use_11_map
+ je 1f;
+ movq %r15, efi_scratch # r15
+ # save previous CR3
+ movq %cr3, %r15
+ movq %r15, efi_scratch+8 # prev_cr3
+ movq efi_scratch+16, %r15 # 1:1 pgt
+ movq %r15, %cr3
+1:
+ .endm
+
+ .macro RESTORE_PGT
+ cmpb $0, efi_use_11_map
+ je 2f
+ movq efi_scratch+8, %r15
+ movq %r15, %cr3
+ movq efi_scratch, %r15
+ FLUSH_TLB_ALL
+2:
+ .endm
+
ENTRY(efi_call0)
SAVE_XMM
subq $32, %rsp
+ SWITCH_PGT
call *%rdi
+ RESTORE_PGT
addq $32, %rsp
RESTORE_XMM
ret
@@ -47,7 +84,9 @@ ENTRY(efi_call1)
SAVE_XMM
subq $32, %rsp
mov %rsi, %rcx
+ SWITCH_PGT
call *%rdi
+ RESTORE_PGT
addq $32, %rsp
RESTORE_XMM
ret
@@ -57,7 +96,9 @@ ENTRY(efi_call2)
SAVE_XMM
subq $32, %rsp
mov %rsi, %rcx
+ SWITCH_PGT
call *%rdi
+ RESTORE_PGT
addq $32, %rsp
RESTORE_XMM
ret
@@ -68,7 +109,9 @@ ENTRY(efi_call3)
subq $32, %rsp
mov %rcx, %r8
mov %rsi, %rcx
+ SWITCH_PGT
call *%rdi
+ RESTORE_PGT
addq $32, %rsp
RESTORE_XMM
ret
@@ -80,7 +123,9 @@ ENTRY(efi_call4)
mov %r8, %r9
mov %rcx, %r8
mov %rsi, %rcx
+ SWITCH_PGT
call *%rdi
+ RESTORE_PGT
addq $32, %rsp
RESTORE_XMM
ret
@@ -93,7 +138,9 @@ ENTRY(efi_call5)
mov %r8, %r9
mov %rcx, %r8
mov %rsi, %rcx
+ SWITCH_PGT
call *%rdi
+ RESTORE_PGT
addq $48, %rsp
RESTORE_XMM
ret
@@ -109,8 +156,17 @@ ENTRY(efi_call6)
mov %r8, %r9
mov %rcx, %r8
mov %rsi, %rcx
+ SWITCH_PGT
call *%rdi
+ RESTORE_PGT
addq $48, %rsp
RESTORE_XMM
ret
ENDPROC(efi_call6)
+
+ .data
+ENTRY(efi_use_11_map)
+ .byte 0
+
+ENTRY(efi_scratch)
+ .fill 3,8,0
--
1.8.3

2013-06-17 17:51:01

by Borislav Petkov

[permalink] [raw]
Subject: [PATCH -v2 2/4] x86, cpa: Map in an arbitrary pgd

From: Borislav Petkov <[email protected]>

Add the ability to map pages in an arbitrary pgd.

Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/pgtable_types.h | 3 +-
arch/x86/mm/pageattr.c | 82 ++++++++++++++++++++++++++++--------
2 files changed, 67 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index e6423002c10b..0613e147f083 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -352,7 +352,8 @@ static inline void update_page_count(int level, unsigned long pages) { }
*/
extern pte_t *lookup_address(unsigned long address, unsigned int *level);
extern phys_addr_t slow_virt_to_phys(void *__address);
-
+extern void kernel_map_pages_in_pgd(pgd_t *pgd, unsigned long address,
+ unsigned numpages, unsigned long page_flags);
#endif /* !__ASSEMBLY__ */

#endif /* _ASM_X86_PGTABLE_DEFS_H */
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index bb32480c2d71..b770b334c97e 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -30,6 +30,7 @@
*/
struct cpa_data {
unsigned long *vaddr;
+ pgd_t *pgd;
pgprot_t mask_set;
pgprot_t mask_clr;
int numpages;
@@ -322,17 +323,9 @@ static inline pgprot_t static_protections(pgprot_t prot, unsigned long address,
return prot;
}

-/*
- * Lookup the page table entry for a virtual address. Return a pointer
- * to the entry and the level of the mapping.
- *
- * Note: We return pud and pmd either when the entry is marked large
- * or when the present bit is not set. Otherwise we would return a
- * pointer to a nonexisting mapping.
- */
-pte_t *lookup_address(unsigned long address, unsigned int *level)
+static pte_t *
+__lookup_address_in_pgd(pgd_t *pgd, unsigned long address, unsigned int *level)
{
- pgd_t *pgd = pgd_offset_k(address);
pud_t *pud;
pmd_t *pmd;

@@ -361,8 +354,30 @@ pte_t *lookup_address(unsigned long address, unsigned int *level)

return pte_offset_kernel(pmd, address);
}
+
+/*
+ * Lookup the page table entry for a virtual address. Return a pointer
+ * to the entry and the level of the mapping.
+ *
+ * Note: We return pud and pmd either when the entry is marked large
+ * or when the present bit is not set. Otherwise we would return a
+ * pointer to a nonexisting mapping.
+ */
+pte_t *lookup_address(unsigned long address, unsigned int *level)
+{
+ return __lookup_address_in_pgd(pgd_offset_k(address), address, level);
+}
EXPORT_SYMBOL_GPL(lookup_address);

+pte_t *_lookup_address_cpa(struct cpa_data *cpa, unsigned long address,
+ unsigned int *level)
+{
+ if (cpa->pgd)
+ return __lookup_address_in_pgd(cpa->pgd, address, level);
+
+ return lookup_address(address, level);
+}
+
/*
* This is necessary because __pa() does not work on some
* kinds of memory, like vmalloc() or the alloc_remap()
@@ -437,7 +452,7 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
* Check for races, another CPU might have split this page
* up already:
*/
- tmp = lookup_address(address, &level);
+ tmp = _lookup_address_cpa(cpa, address, &level);
if (tmp != kpte)
goto out_unlock;

@@ -543,7 +558,8 @@ out_unlock:
}

static int
-__split_large_page(pte_t *kpte, unsigned long address, struct page *base)
+__split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
+ struct page *base)
{
pte_t *pbase = (pte_t *)page_address(base);
unsigned long pfn, pfninc = 1;
@@ -556,7 +572,7 @@ __split_large_page(pte_t *kpte, unsigned long address, struct page *base)
* Check for races, another CPU might have split this page
* up for us already:
*/
- tmp = lookup_address(address, &level);
+ tmp = _lookup_address_cpa(cpa, address, &level);
if (tmp != kpte) {
spin_unlock(&pgd_lock);
return 1;
@@ -632,7 +648,8 @@ __split_large_page(pte_t *kpte, unsigned long address, struct page *base)
return 0;
}

-static int split_large_page(pte_t *kpte, unsigned long address)
+static int split_large_page(struct cpa_data *cpa, pte_t *kpte,
+ unsigned long address)
{
struct page *base;

@@ -644,7 +661,7 @@ static int split_large_page(pte_t *kpte, unsigned long address)
if (!base)
return -ENOMEM;

- if (__split_large_page(kpte, address, base))
+ if (__split_large_page(cpa, kpte, address, base))
__free_page(base);

return 0;
@@ -697,7 +714,10 @@ static int __change_page_attr(struct cpa_data *cpa, int primary)
else
address = *cpa->vaddr;
repeat:
- kpte = lookup_address(address, &level);
+ if (cpa->pgd)
+ kpte = __lookup_address_in_pgd(cpa->pgd, address, &level);
+ else
+ kpte = _lookup_address_cpa(cpa, address, &level);
if (!kpte)
return __cpa_process_fault(cpa, address, primary);

@@ -761,7 +781,7 @@ repeat:
/*
* We have to split the large page:
*/
- err = split_large_page(kpte, address);
+ err = split_large_page(cpa, kpte, address);
if (!err) {
/*
* Do a global flush tlb after splitting the large page
@@ -910,6 +930,8 @@ static int change_page_attr_set_clr(unsigned long *addr, int numpages,
int ret, cache, checkalias;
unsigned long baddr = 0;

+ memset(&cpa, 0, sizeof(cpa));
+
/*
* Check, if we are requested to change a not supported
* feature:
@@ -1356,6 +1378,7 @@ static int __set_pages_p(struct page *page, int numpages)
{
unsigned long tempaddr = (unsigned long) page_address(page);
struct cpa_data cpa = { .vaddr = &tempaddr,
+ .pgd = 0,
.numpages = numpages,
.mask_set = __pgprot(_PAGE_PRESENT | _PAGE_RW),
.mask_clr = __pgprot(0),
@@ -1374,6 +1397,7 @@ static int __set_pages_np(struct page *page, int numpages)
{
unsigned long tempaddr = (unsigned long) page_address(page);
struct cpa_data cpa = { .vaddr = &tempaddr,
+ .pgd = 0,
.numpages = numpages,
.mask_set = __pgprot(0),
.mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW),
@@ -1434,6 +1458,30 @@ bool kernel_page_present(struct page *page)

#endif /* CONFIG_DEBUG_PAGEALLOC */

+void kernel_map_pages_in_pgd(pgd_t *pgd, unsigned long address,
+ unsigned numpages, unsigned long page_flags)
+{
+ struct cpa_data cpa = {
+ .vaddr = &address,
+ .pgd = pgd,
+ .numpages = numpages,
+ .mask_set = __pgprot(0),
+ .mask_clr = __pgprot(0),
+ .flags = 0
+ };
+
+ if (!(__supported_pte_mask & _PAGE_NX))
+ return;
+
+ if (!(page_flags & _PAGE_NX))
+ cpa.mask_clr = __pgprot(_PAGE_NX);
+
+ cpa.mask_set = __pgprot(_PAGE_PRESENT | page_flags);
+
+ __change_page_attr_set_clr(&cpa, 0);
+ __flush_tlb_all();
+}
+
/*
* The testcases use internal knowledge of the implementation that shouldn't
* be exposed to the rest of the kernel. Include these directly here.
--
1.8.3

2013-06-17 17:51:38

by Borislav Petkov

[permalink] [raw]
Subject: [PATCH -v2 1/4] efi: Convert runtime services function ptrs

From: Borislav Petkov <[email protected]>

... to void * like the boot services and lose all the void * casts. No
functionality change.

Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/boot/compressed/eboot.c | 2 +-
arch/x86/include/asm/efi.h | 28 ++++++++++++++--------------
include/linux/efi.h | 28 ++++++++++++++--------------
3 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index 35ee62fccf98..4060c8daf05e 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -266,7 +266,7 @@ static efi_status_t setup_efi_vars(struct boot_params *params)
while (data && data->next)
data = (struct setup_data *)(unsigned long)data->next;

- status = efi_call_phys4((void *)sys_table->runtime->query_variable_info,
+ status = efi_call_phys4(sys_table->runtime->query_variable_info,
EFI_VARIABLE_NON_VOLATILE |
EFI_VARIABLE_BOOTSERVICE_ACCESS |
EFI_VARIABLE_RUNTIME_ACCESS, &store_size,
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 2fb5d5884e23..5b33686b6995 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -52,40 +52,40 @@ extern u64 efi_call6(void *fp, u64 arg1, u64 arg2, u64 arg3,
u64 arg4, u64 arg5, u64 arg6);

#define efi_call_phys0(f) \
- efi_call0((void *)(f))
+ efi_call0((f))
#define efi_call_phys1(f, a1) \
- efi_call1((void *)(f), (u64)(a1))
+ efi_call1((f), (u64)(a1))
#define efi_call_phys2(f, a1, a2) \
- efi_call2((void *)(f), (u64)(a1), (u64)(a2))
+ efi_call2((f), (u64)(a1), (u64)(a2))
#define efi_call_phys3(f, a1, a2, a3) \
- efi_call3((void *)(f), (u64)(a1), (u64)(a2), (u64)(a3))
+ efi_call3((f), (u64)(a1), (u64)(a2), (u64)(a3))
#define efi_call_phys4(f, a1, a2, a3, a4) \
- efi_call4((void *)(f), (u64)(a1), (u64)(a2), (u64)(a3), \
+ efi_call4((f), (u64)(a1), (u64)(a2), (u64)(a3), \
(u64)(a4))
#define efi_call_phys5(f, a1, a2, a3, a4, a5) \
- efi_call5((void *)(f), (u64)(a1), (u64)(a2), (u64)(a3), \
+ efi_call5((f), (u64)(a1), (u64)(a2), (u64)(a3), \
(u64)(a4), (u64)(a5))
#define efi_call_phys6(f, a1, a2, a3, a4, a5, a6) \
- efi_call6((void *)(f), (u64)(a1), (u64)(a2), (u64)(a3), \
+ efi_call6((f), (u64)(a1), (u64)(a2), (u64)(a3), \
(u64)(a4), (u64)(a5), (u64)(a6))

#define efi_call_virt0(f) \
- efi_call0((void *)(efi.systab->runtime->f))
+ efi_call0((efi.systab->runtime->f))
#define efi_call_virt1(f, a1) \
- efi_call1((void *)(efi.systab->runtime->f), (u64)(a1))
+ efi_call1((efi.systab->runtime->f), (u64)(a1))
#define efi_call_virt2(f, a1, a2) \
- efi_call2((void *)(efi.systab->runtime->f), (u64)(a1), (u64)(a2))
+ efi_call2((efi.systab->runtime->f), (u64)(a1), (u64)(a2))
#define efi_call_virt3(f, a1, a2, a3) \
- efi_call3((void *)(efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
+ efi_call3((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
(u64)(a3))
#define efi_call_virt4(f, a1, a2, a3, a4) \
- efi_call4((void *)(efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
+ efi_call4((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
(u64)(a3), (u64)(a4))
#define efi_call_virt5(f, a1, a2, a3, a4, a5) \
- efi_call5((void *)(efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
+ efi_call5((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
(u64)(a3), (u64)(a4), (u64)(a5))
#define efi_call_virt6(f, a1, a2, a3, a4, a5, a6) \
- efi_call6((void *)(efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
+ efi_call6((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
(u64)(a3), (u64)(a4), (u64)(a5), (u64)(a6))

extern void __iomem *efi_ioremap(unsigned long addr, unsigned long size,
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 2bc0ad78d058..21ae6b3c0359 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -287,20 +287,20 @@ typedef struct {

typedef struct {
efi_table_hdr_t hdr;
- unsigned long get_time;
- unsigned long set_time;
- unsigned long get_wakeup_time;
- unsigned long set_wakeup_time;
- unsigned long set_virtual_address_map;
- unsigned long convert_pointer;
- unsigned long get_variable;
- unsigned long get_next_variable;
- unsigned long set_variable;
- unsigned long get_next_high_mono_count;
- unsigned long reset_system;
- unsigned long update_capsule;
- unsigned long query_capsule_caps;
- unsigned long query_variable_info;
+ void *get_time;
+ void *set_time;
+ void *get_wakeup_time;
+ void *set_wakeup_time;
+ void *set_virtual_address_map;
+ void *convert_pointer;
+ void *get_variable;
+ void *get_next_variable;
+ void *set_variable;
+ void *get_next_high_mono_count;
+ void *reset_system;
+ void *update_capsule;
+ void *query_capsule_caps;
+ void *query_variable_info;
} efi_runtime_services_t;

typedef efi_status_t efi_get_time_t (efi_time_t *tm, efi_time_cap_t *tc);
--
1.8.3

2013-06-17 17:51:53

by Borislav Petkov

[permalink] [raw]
Subject: [PATCH -v2 3/4] x86, efi: Add an efi= kernel command line parameter

From: Borislav Petkov <[email protected]>

.. for passing miscellaneous options from the kernel command line.

Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/platform/efi/efi.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 82089d8b1954..5af5b97bf203 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -88,6 +88,11 @@ static u64 active_size;

unsigned long x86_efi_facility;

+ /* 1:1 mapping of services regions */
+#define EFI_CFG_MAP11 BIT(0)
+
+static unsigned long efi_config;
+
/*
* Returns 1 if 'facility' is enabled, 0 otherwise.
*/
@@ -1167,3 +1172,17 @@ efi_status_t efi_query_variable_store(u32 attributes, unsigned long size)
return EFI_SUCCESS;
}
EXPORT_SYMBOL_GPL(efi_query_variable_store);
+
+static int __init parse_efi_cmdline(char *str)
+{
+ if (*str == '=')
+ str++;
+
+#ifdef CONFIG_X86_64
+ if (!strncmp(str, "1:1_map", 7))
+ efi_config |= EFI_CFG_MAP11;
+#endif
+
+ return 0;
+}
+early_param("efi", parse_efi_cmdline);
--
1.8.3

2013-06-19 12:52:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping


* Borislav Petkov <[email protected]> wrote:

> From: Borislav Petkov <[email protected]>
>
> Hi all,
>
> this is just a snapshot of the current state of affairs. The patchset
> starts to boot successfully on real hardware now but we're far away from
> the coverage we'd like to have before we even consider upstreaming it.

Looks pretty clean.

> And yes, considering the sick f*ck EFI is, we're keeping the 1:1 mapping
> optional and off by default (you need to boot with "efi=1:1_map" to
> enable it).

I hope making it a weird boot option is not the end plan, there's little
point in _not_ enabling 1:1 mappings by default eventually: the 1:1
mapping is supposed to emulate a "Windows compatible" EFI environment
better and is expected to work around certain EFI runtime crashes.

Thanks,

Ingo

2013-06-19 13:02:35

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Wed, Jun 19, 2013 at 02:52:43PM +0200, Ingo Molnar wrote:
> I hope making it a weird boot option is not the end plan, there's
> little point in _not_ enabling 1:1 mappings by default eventually:
> the 1:1 mapping is supposed to emulate a "Windows compatible" EFI
> environment better and is expected to work around certain EFI runtime
> crashes.

And yet there are the Macs which reportedly cannot stomach this.

And then there's the issue where some boxes cannot boot through the EFI
stub with those patches even without "efi=1:1_map" on the command line.
The issue has something to do with the "cmpb $0, efi_use_11_map" in the
efi_callX stubs.

And then again, other boxes have no problem with it and boot perfectly
fine.

So I don't know - it all looks like a weird boot, opt-in option for now.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-06-19 13:04:39

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping


* Borislav Petkov <[email protected]> wrote:

> On Wed, Jun 19, 2013 at 02:52:43PM +0200, Ingo Molnar wrote:
> > I hope making it a weird boot option is not the end plan, there's
> > little point in _not_ enabling 1:1 mappings by default eventually:
> > the 1:1 mapping is supposed to emulate a "Windows compatible" EFI
> > environment better and is expected to work around certain EFI runtime
> > crashes.
>
> And yet there are the Macs which reportedly cannot stomach this.

Do we know why?

> And then there's the issue where some boxes cannot boot through the EFI
> stub with those patches even without "efi=1:1_map" on the command line.
> The issue has something to do with the "cmpb $0, efi_use_11_map" in the
> efi_callX stubs.

A bug I suspect?

> And then again, other boxes have no problem with it and boot perfectly
> fine.
>
> So I don't know - it all looks like a weird boot, opt-in option for now.

But once it works reliably we can enable it, right?

Thanks,

Ingo

2013-06-19 13:25:29

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Wed, Jun 19, 2013 at 03:04:34PM +0200, Ingo Molnar wrote:
> Do we know why?

Well, according to mjg59 some Macs break if we don't give them a map
which uses high addresses.

I can imagine flipping the meaning of this option to be on by default
and "efi=no_11_map" to disable the 1:1 map for those Macs.

> A bug I suspect?

Probably. The problem is, it is very hard to debug the boot stub that
early. And of course, I can't reproduce it in qemu :(. If only I had a
hardware debugger...

> But once it works reliably we can enable it, right?

It's all the same to me - I hate EFI with passion so whatever people
agree upon, I'll do it.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-06-19 16:05:39

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Wed, Jun 19, 2013 at 03:02:25PM +0200, Borislav Petkov wrote:
> On Wed, Jun 19, 2013 at 02:52:43PM +0200, Ingo Molnar wrote:
> > I hope making it a weird boot option is not the end plan, there's
> > little point in _not_ enabling 1:1 mappings by default eventually:
> > the 1:1 mapping is supposed to emulate a "Windows compatible" EFI
> > environment better and is expected to work around certain EFI runtime
> > crashes.
>
> And yet there are the Macs which reportedly cannot stomach this.

I suspect they'll be fine with having a 1:1 map, as long as we pass the
high mappings via SetVirtualAddressMap().

--
Matthew Garrett | [email protected]

2013-06-19 16:08:11

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Wed, Jun 19, 2013 at 03:04:34PM +0200, Ingo Molnar wrote:
> * Borislav Petkov <[email protected]> wrote:
> > And yet there are the Macs which reportedly cannot stomach this.
>
> Do we know why?

I got lost in a maze of pointer arithmetic. There seems to be an
assumption that nvram writes should be forbidden if in runtime mode but
with pointers still below the phys/virt split, which obviously makes no
sense but hey.

But, as always, the only reliable thing to do here is to behave as much
like Windows as possible. Which means performing the 1:1 mapping but
maintaining the high mapping, and passing the high values via
SetVirtualAddressMap.

--
Matthew Garrett | [email protected]

2013-06-19 16:18:37

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Wed, Jun 19, 2013 at 05:08:04PM +0100, Matthew Garrett wrote:
> But, as always, the only reliable thing to do here is to behave as
> much like Windows as possible. Which means performing the 1:1 mapping
> but maintaining the high mapping, and passing the high values via
> SetVirtualAddressMap.

We can't pass the high values via SetVirtualAddressMap and have EFI
runtime in the kexec-ed kernel, as you and I established last week. And
since not all would want EFI runtime in the kexec-ed kernel, I'm leaning
more towards a boot-time option which enables the 1:1 mapping.

Btw, why would you even want the 1:1 mappings if we pass the high values
via SetVirtualAddressMap?

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-06-19 16:21:23

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Wed, Jun 19, 2013 at 06:18:27PM +0200, Borislav Petkov wrote:
> On Wed, Jun 19, 2013 at 05:08:04PM +0100, Matthew Garrett wrote:
> > But, as always, the only reliable thing to do here is to behave as
> > much like Windows as possible. Which means performing the 1:1 mapping
> > but maintaining the high mapping, and passing the high values via
> > SetVirtualAddressMap.
>
> We can't pass the high values via SetVirtualAddressMap and have EFI
> runtime in the kexec-ed kernel, as you and I established last week. And
> since not all would want EFI runtime in the kexec-ed kernel, I'm leaning
> more towards a boot-time option which enables the 1:1 mapping.

Yes, kexec needs a different solution.

> Btw, why would you even want the 1:1 mappings if we pass the high values
> via SetVirtualAddressMap?

Because firmware images don't always update all of the pointers, and so
will crash if the 1:1 mappings aren't present.

--
Matthew Garrett | [email protected]

2013-06-19 16:38:15

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Wed, Jun 19, 2013 at 05:21:15PM +0100, Matthew Garrett wrote:
> Yes, kexec needs a different solution.

No need. If we say, "efi=use_11_map", the 1:1 map will be shoved down
SetVirtualAddressMap. Otherwise the high mappings.

> Because firmware images don't always update all of the pointers, and
> so will crash if the 1:1 mappings aren't present.

Ok, so it sounds like we want to *always* create both mappings but,
depending on what we want, to shove down SetVirtualAddressMap a
different set. And the 1:1 map will be the optional one which we give
SetVirtualAddressMap only when user wants it, i.e. when booting with
"efi=1:1_map".

Makes sense?

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-06-19 16:48:28

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Wed, Jun 19, 2013 at 06:38:04PM +0200, Borislav Petkov wrote:
> On Wed, Jun 19, 2013 at 05:21:15PM +0100, Matthew Garrett wrote:
> > Yes, kexec needs a different solution.
>
> No need. If we say, "efi=use_11_map", the 1:1 map will be shoved down
> SetVirtualAddressMap. Otherwise the high mappings.

Ah, sure - if we're willing to take an argument then we can leave it at
that. But having a stable set of high addresses for UEFI is also an
option.

> > Because firmware images don't always update all of the pointers, and
> > so will crash if the 1:1 mappings aren't present.
>
> Ok, so it sounds like we want to *always* create both mappings but,
> depending on what we want, to shove down SetVirtualAddressMap a
> different set. And the 1:1 map will be the optional one which we give
> SetVirtualAddressMap only when user wants it, i.e. when booting with
> "efi=1:1_map".

Yup, I think that sounds ideal.

--
Matthew Garrett | [email protected]

2013-06-19 16:50:44

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Wed, 2013-06-19 at 18:38 +0200, Borislav Petkov wrote:
> On Wed, Jun 19, 2013 at 05:21:15PM +0100, Matthew Garrett wrote:
> > Yes, kexec needs a different solution.
>
> No need. If we say, "efi=use_11_map", the 1:1 map will be shoved down
> SetVirtualAddressMap. Otherwise the high mappings.
>
> > Because firmware images don't always update all of the pointers, and
> > so will crash if the 1:1 mappings aren't present.
>
> Ok, so it sounds like we want to *always* create both mappings but,
> depending on what we want, to shove down SetVirtualAddressMap a
> different set. And the 1:1 map will be the optional one which we give
> SetVirtualAddressMap only when user wants it, i.e. when booting with
> "efi=1:1_map".
>
> Makes sense?

I think it will work. The only thing I'd worry about is aliasing. This
scheme clearly won't work for any virtually indexed processor (so it's
basically x86 only) but even on Physically Indexed, you do have to make
sure the cache attributes of any given page are the same for all virtual
address aliases. As long as the 1:1 mapping is writeback, I think this
is satisfied.

James

2013-06-19 16:59:12

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Wed, Jun 19, 2013 at 05:48:22PM +0100, Matthew Garrett wrote:
> > Ok, so it sounds like we want to *always* create both mappings but,
> > depending on what we want, to shove down SetVirtualAddressMap a
> > different set. And the 1:1 map will be the optional one which we give
> > SetVirtualAddressMap only when user wants it, i.e. when booting with
> > "efi=1:1_map".
>
> Yup, I think that sounds ideal.

Crap, I got completely sidetracked. The 1:1 mappings go in a different
pagetable (real_mode_header->trampoline_pgd) than the kernel one
(i.e. init_mm.pgd). However, the ->trampoline_pgd has all mappings
anyway, which means that if we want to do EFI runtime calls with the
high mappings but *also* have the 1:1 mappings established, we should
*always* switch to that pagetable when doing those calls.

hpa, MattF, agreed?

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-06-19 17:26:13

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On 06/19/2013 08:02 AM, Borislav Petkov wrote:
>
> And yet there are the Macs which reportedly cannot stomach this.
>

No, the reports are that if you use the 1:1 map as the primary address
on Macs the drivers fail... not that you can't have a 1:1 map.

-hpa

2013-06-19 17:37:43

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Wed, Jun 19, 2013 at 12:25:42PM -0500, H. Peter Anvin wrote:
> On 06/19/2013 08:02 AM, Borislav Petkov wrote:
> >
> > And yet there are the Macs which reportedly cannot stomach this.
> >
> No, the reports are that if you use the 1:1 map as the primary address
> on Macs the drivers fail... not that you can't have a 1:1 map.

That's what I meant: ... cannot stomach when the 1:1 map is shoved down
SetVirtualAddressMap.

The thing is, if we want to have both the 1:1 map and the high map
during an EFI runtime call, we would need to *always* switch the
pagetable for an EFI runtime call and establish both mappings in
->trampoline_pgd beforehand.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-06-19 17:38:47

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On 06/19/2013 12:37 PM, Borislav Petkov wrote:
> On Wed, Jun 19, 2013 at 12:25:42PM -0500, H. Peter Anvin wrote:
>> On 06/19/2013 08:02 AM, Borislav Petkov wrote:
>>>
>>> And yet there are the Macs which reportedly cannot stomach this.
>>>
>> No, the reports are that if you use the 1:1 map as the primary address
>> on Macs the drivers fail... not that you can't have a 1:1 map.
>
> That's what I meant: ... cannot stomach when the 1:1 map is shoved down
> SetVirtualAddressMap.
>
> The thing is, if we want to have both the 1:1 map and the high map
> during an EFI runtime call, we would need to *always* switch the
> pagetable for an EFI runtime call and establish both mappings in
> ->trampoline_pgd beforehand.
>

I thought that was the plan?

-hpa

2013-06-19 17:50:15

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Wed, Jun 19, 2013 at 12:38:24PM -0500, H. Peter Anvin wrote:
> I thought that was the plan?

Well, currently if I'm booted with "efi=1:1_map" I'm creating only the
1:1 mapping in ->trampoline_pgd and switching the pagetable only then.
Otherwise, I'm using the high, ioremapped mappings - i.e., what we have
now.

I guess I can sync the kernel address space into ->trampoline_pgd after
having created the 1:1 mappings and always switch the pagetable later,
after we've done SetVirtualAddressMap.

Which should take care of the EFI boot stub issue too, as I can define
another set of efi_callX which switch the pagetable unconditionally.

Let me see how that pans out...

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-06-20 09:13:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping


* Matthew Garrett <[email protected]> wrote:

> On Wed, Jun 19, 2013 at 03:04:34PM +0200, Ingo Molnar wrote:
> > * Borislav Petkov <[email protected]> wrote:
> > > And yet there are the Macs which reportedly cannot stomach this.
> >
> > Do we know why?
>
> I got lost in a maze of pointer arithmetic. There seems to be an
> assumption that nvram writes should be forbidden if in runtime mode but
> with pointers still below the phys/virt split, which obviously makes no
> sense but hey.
>
> But, as always, the only reliable thing to do here is to behave as much
> like Windows as possible. [...]

Amen ...

> [...] Which means performing the 1:1 mapping but maintaining the high
> mapping, and passing the high values via SetVirtualAddressMap.

Cool - and supposedly this will work in a Mac environment as well? Would
be very nice to avoid fundamentally fragile system specific quirks for
something as fundamental as the EFI runtime memory mapping model ...

Thanks,

Ingo

2013-06-20 09:15:48

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Thu, Jun 20, 2013 at 11:13:21AM +0200, Ingo Molnar wrote:

> Cool - and supposedly this will work in a Mac environment as well? Would
> be very nice to avoid fundamentally fragile system specific quirks for
> something as fundamental as the EFI runtime memory mapping model ...

Apple is the only case where I'd expect there to be an issue, since they
only started supporting booting Windows via UEFI on very recent systems.
However, unless they're actually sniffing the page tables on UEFI entry,
I can't see any way that this could break things…

--
Matthew Garrett | [email protected]

2013-06-20 09:22:45

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping


* Matthew Garrett <[email protected]> wrote:

> On Thu, Jun 20, 2013 at 11:13:21AM +0200, Ingo Molnar wrote:
>
> > Cool - and supposedly this will work in a Mac environment as well? Would
> > be very nice to avoid fundamentally fragile system specific quirks for
> > something as fundamental as the EFI runtime memory mapping model ...
>
> Apple is the only case where I'd expect there to be an issue, since they
> only started supporting booting Windows via UEFI on very recent systems.
> However, unless they're actually sniffing the page tables on UEFI entry,
> I can't see any way that this could break things???

Agreed - I was susprised to see that the runtime was able to _break_ in
any way due to 1:1: my assumption was that it can only get better.

But I did not realize that the 1:1 boot flag also changed what was passed
down, which probably explains the breakages.

I'd even argue to not do this whole boot flag thing at all - just
standardize on the Windows compatibility model as closely as possible.

Thanks,

Ingo

2013-06-20 09:33:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Thu, Jun 20, 2013 at 11:22:37AM +0200, Ingo Molnar wrote:
> > > Cool - and supposedly this will work in a Mac environment as well? Would
> > > be very nice to avoid fundamentally fragile system specific quirks for
> > > something as fundamental as the EFI runtime memory mapping model ...
> >
> > Apple is the only case where I'd expect there to be an issue, since they
> > only started supporting booting Windows via UEFI on very recent systems.
> > However, unless they're actually sniffing the page tables on UEFI entry,
> > I can't see any way that this could break things???
>
> Agreed - I was susprised to see that the runtime was able to _break_ in
> any way due to 1:1: my assumption was that it can only get better.
>
> But I did not realize that the 1:1 boot flag also changed what was passed
> down, which probably explains the breakages.

Right, in the next version, the boot flag will influence only what's
being passed down.

> I'd even argue to not do this whole boot flag thing at all - just
> standardize on the Windows compatibility model as closely as possible.

This will break the Macs so maybe we can do

efi=no_11_map

so the Macs can still boot but use the 1:1 map by default.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-06-20 09:44:57

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Thu, Jun 20, 2013 at 11:33:37AM +0200, Borislav Petkov wrote:
> This will break the Macs so maybe we can do
>
> efi=no_11_map
>
> so the Macs can still boot but use the 1:1 map by default.

I'm going to guess that there are more people running unmodified Linux
kernels on Macs than there are people using kexec, so just pass the high
maps by default and let the enterprise kernels that care about kexec do
something different.

--
Matthew Garrett | [email protected]

2013-06-20 14:53:43

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Thu, 2013-06-20 at 10:44 +0100, Matthew Garrett wrote:
> On Thu, Jun 20, 2013 at 11:33:37AM +0200, Borislav Petkov wrote:
> > This will break the Macs so maybe we can do
> >
> > efi=no_11_map
> >
> > so the Macs can still boot but use the 1:1 map by default.
>
> I'm going to guess that there are more people running unmodified Linux
> kernels on Macs than there are people using kexec, so just pass the high
> maps by default and let the enterprise kernels that care about kexec do
> something different.

Do we have to make this a popularity contest? Parallels is currently in
beta with a ksplice like upgrade involving kexec ... basically you
checkpoint the system to ram, kexec to the new kernel and restore the
images from ram ... it can upgrade a new kernel in a matter of seconds
(and unlike ksplice, it's the correct kernel and can move between major
versions). Since all the components that do this are open source, we
anticipate this will become a common way to maintain servers, so kexec
will see even more use than it currently does.

Can't we detect Macs from some of the UEFI strings at boot time and do
the right thing with the boot switch (which can be overriden from the
kernel command line if we get it wrong)?

James



2013-06-20 16:29:31

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Thu, Jun 20, 2013 at 07:53:39AM -0700, James Bottomley wrote:

> Can't we detect Macs from some of the UEFI strings at boot time and do
> the right thing with the boot switch (which can be overriden from the
> kernel command line if we get it wrong)?

Yes, and then our behaviour differs from Windows and so we'll have
inevitably broken some other system.

--
Matthew Garrett | [email protected]

2013-06-20 16:53:19

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Thu, Jun 20, 2013 at 06:44:46PM +0200, Jiri Kosina wrote:

> So if we properly detect those (and only those), we mimic Windows
> completely, right?

No. Windows passes addresses above the phys/virt split to
SetVirtualAddressMap().

--
Matthew Garrett | [email protected]

2013-06-20 16:54:32

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Thu, Jun 20, 2013 at 09:46:15AM -0700, James Bottomley wrote:

> Unless you can think of the way out of this, we seem to have the stark
> choice of behave like windows or allow kexec. For the server market,
> kexec wins, so either we find a way not to have to make the choice or we
> do something automatic to make it fairly painless.

hpa suggested ensuring that UEFI regions are mapped at fixed high
offsets. Someone who cares about kexec should probably make that happen.

--
Matthew Garrett | [email protected]

2013-06-20 17:01:36

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Thu, Jun 20, 2013 at 05:54:26PM +0100, Matthew Garrett wrote:
> On Thu, Jun 20, 2013 at 09:46:15AM -0700, James Bottomley wrote:
>
> > Unless you can think of the way out of this, we seem to have the stark
> > choice of behave like windows or allow kexec. For the server market,
> > kexec wins, so either we find a way not to have to make the choice or we
> > do something automatic to make it fairly painless.
>
> hpa suggested ensuring that UEFI regions are mapped at fixed high
> offsets. Someone who cares about kexec should probably make that happen.

If we can detect the Macs, we can make this decision automatic. And
since no Mac boots windoze, a single DMI check of the sort "if (Mac)"
should suffice.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-06-20 16:46:18

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Thu, 2013-06-20 at 17:29 +0100, Matthew Garrett wrote:
> On Thu, Jun 20, 2013 at 07:53:39AM -0700, James Bottomley wrote:
>
> > Can't we detect Macs from some of the UEFI strings at boot time and do
> > the right thing with the boot switch (which can be overriden from the
> > kernel command line if we get it wrong)?
>
> Yes, and then our behaviour differs from Windows and so we'll have
> inevitably broken some other system.

Unless you can think of the way out of this, we seem to have the stark
choice of behave like windows or allow kexec. For the server market,
kexec wins, so either we find a way not to have to make the choice or we
do something automatic to make it fairly painless.

James



2013-06-20 18:08:18

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Thu, Jun 20, 2013 at 06:12:10PM +0100, Matthew Garrett wrote:
> On Thu, Jun 20, 2013 at 07:01:24PM +0200, Borislav Petkov wrote:
>
> > If we can detect the Macs, we can make this decision automatic. And
> > since no Mac boots windoze, a single DMI check of the sort "if (Mac)"
> > should suffice.
>
> Yes, we can special-case Macs. But since our behaviour is then obviously
> different to Windows, we'll inevitably break some other system.

Why different? We'll have the high mappings and shove the 1:1 mappings
down SetVirtualAddressMap by default.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-06-20 18:10:26

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Thu, Jun 20, 2013 at 08:08:08PM +0200, Borislav Petkov wrote:
> On Thu, Jun 20, 2013 at 06:12:10PM +0100, Matthew Garrett wrote:
> > On Thu, Jun 20, 2013 at 07:01:24PM +0200, Borislav Petkov wrote:
> >
> > > If we can detect the Macs, we can make this decision automatic. And
> > > since no Mac boots windoze, a single DMI check of the sort "if (Mac)"
> > > should suffice.
> >
> > Yes, we can special-case Macs. But since our behaviour is then obviously
> > different to Windows, we'll inevitably break some other system.
>
> Why different? We'll have the high mappings and shove the 1:1 mappings
> down SetVirtualAddressMap by default.

Because Windows passes high addresses to SetVirtualAddressMap(), and
because if you can imagine firmware developers getting it wrong then
firmware developers will have got it wrong.

--
Matthew Garrett | [email protected]

2013-06-20 18:17:39

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Thu, Jun 20, 2013 at 08:14:45PM +0200, Borislav Petkov wrote:
> On Thu, Jun 20, 2013 at 07:10:15PM +0100, Matthew Garrett wrote:
> > Because Windows passes high addresses to SetVirtualAddressMap(), and
> > because if you can imagine firmware developers getting it wrong then
> > firmware developers will have got it wrong.
>
> Can we reversely assume that if we'd used fixed high offsets, as hpa
> suggests, then it'll be fine? IOW, are any high addresses, even fixed
> ones, fine?

Windows actually seems to start at the top of address space and go down
- this is what I get booting Windows 8 under kvm. It looks like very
high addresses are fine, and we're currently using "low" high addresses,
so I suspect we're fine pretty much anywhere in that range.

****** SetVirtualAddressMap
Type: 5
Physical Start: 3E878000
Virtual Start: FFFFFFFFFFBEB000
Number Of Pages: 15
Attributes: 800000000000000F
Type: 6
Physical Start: 3E88D000
Virtual Start: FFFFFFFFFFBD6000
Number Of Pages: 15
Attributes: 800000000000000F
Type: 5
Physical Start: 3FB22000
Virtual Start: FFFFFFFFFFBA6000
Number Of Pages: 30
Attributes: 800000000000000F
Type: 6
Physical Start: 3FB52000
Virtual Start: FFFFFFFFFFB82000
Number Of Pages: 24
Attributes: 800000000000000F
Type: 6
Physical Start: 3FFE0000
Virtual Start: FFFFFFFFFFB62000
Number Of Pages: 20


--
Matthew Garrett | [email protected]

2013-06-20 18:14:57

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Thu, Jun 20, 2013 at 07:10:15PM +0100, Matthew Garrett wrote:
> Because Windows passes high addresses to SetVirtualAddressMap(), and
> because if you can imagine firmware developers getting it wrong then
> firmware developers will have got it wrong.

Can we reversely assume that if we'd used fixed high offsets, as hpa
suggests, then it'll be fine? IOW, are any high addresses, even fixed
ones, fine?

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-06-20 14:35:10

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Thu, 20 Jun 2013, Matthew Garrett wrote:

> > This will break the Macs so maybe we can do
> >
> > efi=no_11_map
> >
> > so the Macs can still boot but use the 1:1 map by default.
>
> I'm going to guess that there are more people running unmodified Linux
> kernels on Macs than there are people using kexec,

I'd be careful in order not to underestimate how much kexec is being used.
[At least some] distros are using it during installation process.

--
Jiri Kosina
SUSE Labs

2013-06-20 17:12:19

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Thu, Jun 20, 2013 at 07:01:24PM +0200, Borislav Petkov wrote:

> If we can detect the Macs, we can make this decision automatic. And
> since no Mac boots windoze, a single DMI check of the sort "if (Mac)"
> should suffice.

Yes, we can special-case Macs. But since our behaviour is then obviously
different to Windows, we'll inevitably break some other system.

--
Matthew Garrett | [email protected]

2013-06-20 18:29:52

by Luck, Tony

[permalink] [raw]
Subject: [PATCH] [IA64] sim: Add casts to avoid assignment warnings

Pointers in the efi_runtime_services_t structure now have type
"void *" (formerly they were "unsigned long"). So we now see a
bunch of warnings like this:

arch/ia64/hp/sim/boot/fw-emu.c:293: warning: assignment makes pointer from integer without a cast

Add (void *) casts to the 10 affected lines to make the build quiet again.

Signed-off-by: Tony Luck <[email protected]>

---

Boris, Matt - Can you add this patch to the same tree that

commit 43ab0476a648053e5998bf081f47f215375a4502 [linux-next id]
efi: Convert runtime services function ptrs

is in so that it will follow along behind it. Thanks.

arch/ia64/hp/sim/boot/fw-emu.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/ia64/hp/sim/boot/fw-emu.c b/arch/ia64/hp/sim/boot/fw-emu.c
index 271f412..87bf9ad 100644
--- a/arch/ia64/hp/sim/boot/fw-emu.c
+++ b/arch/ia64/hp/sim/boot/fw-emu.c
@@ -290,16 +290,16 @@ sys_fw_init (const char *args, int arglen)
efi_runtime->hdr.signature = EFI_RUNTIME_SERVICES_SIGNATURE;
efi_runtime->hdr.revision = EFI_RUNTIME_SERVICES_REVISION;
efi_runtime->hdr.headersize = sizeof(efi_runtime->hdr);
- efi_runtime->get_time = __pa(&fw_efi_get_time);
- efi_runtime->set_time = __pa(&efi_unimplemented);
- efi_runtime->get_wakeup_time = __pa(&efi_unimplemented);
- efi_runtime->set_wakeup_time = __pa(&efi_unimplemented);
- efi_runtime->set_virtual_address_map = __pa(&efi_unimplemented);
- efi_runtime->get_variable = __pa(&efi_unimplemented);
- efi_runtime->get_next_variable = __pa(&efi_unimplemented);
- efi_runtime->set_variable = __pa(&efi_unimplemented);
- efi_runtime->get_next_high_mono_count = __pa(&efi_unimplemented);
- efi_runtime->reset_system = __pa(&efi_reset_system);
+ efi_runtime->get_time = (void *)__pa(&fw_efi_get_time);
+ efi_runtime->set_time = (void *)__pa(&efi_unimplemented);
+ efi_runtime->get_wakeup_time = (void *)__pa(&efi_unimplemented);
+ efi_runtime->set_wakeup_time = (void *)__pa(&efi_unimplemented);
+ efi_runtime->set_virtual_address_map = (void *)__pa(&efi_unimplemented);
+ efi_runtime->get_variable = (void *)__pa(&efi_unimplemented);
+ efi_runtime->get_next_variable = (void *)__pa(&efi_unimplemented);
+ efi_runtime->set_variable = (void *)__pa(&efi_unimplemented);
+ efi_runtime->get_next_high_mono_count = (void *)__pa(&efi_unimplemented);
+ efi_runtime->reset_system = (void *)__pa(&efi_reset_system);

efi_tables->guid = SAL_SYSTEM_TABLE_GUID;
efi_tables->table = __pa(sal_systab);
--
1.8.1.4

2013-06-20 18:47:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Thu, Jun 20, 2013 at 07:17:31PM +0100, Matthew Garrett wrote:
> On Thu, Jun 20, 2013 at 08:14:45PM +0200, Borislav Petkov wrote:
> > On Thu, Jun 20, 2013 at 07:10:15PM +0100, Matthew Garrett wrote:
> > > Because Windows passes high addresses to SetVirtualAddressMap(), and
> > > because if you can imagine firmware developers getting it wrong then
> > > firmware developers will have got it wrong.
> >
> > Can we reversely assume that if we'd used fixed high offsets, as hpa
> > suggests, then it'll be fine? IOW, are any high addresses, even fixed
> > ones, fine?
>
> Windows actually seems to start at the top of address space and go down
> - this is what I get booting Windows 8 under kvm. It looks like very
> high addresses are fine, and we're currently using "low" high addresses,
> so I suspect we're fine pretty much anywhere in that range.
>
> ****** SetVirtualAddressMap
> Type: 5
> Physical Start: 3E878000
> Virtual Start: FFFFFFFFFFBEB000
> Number Of Pages: 15
> Attributes: 800000000000000F
> Type: 6
> Physical Start: 3E88D000
> Virtual Start: FFFFFFFFFFBD6000
> Number Of Pages: 15
> Attributes: 800000000000000F
> Type: 5
> Physical Start: 3FB22000
> Virtual Start: FFFFFFFFFFBA6000
> Number Of Pages: 30
> Attributes: 800000000000000F
> Type: 6
> Physical Start: 3FB52000
> Virtual Start: FFFFFFFFFFB82000
> Number Of Pages: 24
> Attributes: 800000000000000F
> Type: 6
> Physical Start: 3FFE0000
> Virtual Start: FFFFFFFFFFB62000
> Number Of Pages: 20

I guess we can do a top-down allocation, starting from the highest
virtual addresses:

EFI_HIGHEST_ADDRESS
|
| size1
|
--> region1
|
| size2
|
--> region2

...

and we make EFI_HIGHEST_ADDRESS be the same absolute number on every
system.

hpa, is this close to what you had in mind? It would be prudent to
verify whether this will suit well with the kexec virtual space layout
though...

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-06-20 16:45:59

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Thu, 20 Jun 2013, Matthew Garrett wrote:

> > Can't we detect Macs from some of the UEFI strings at boot time and do
> > the right thing with the boot switch (which can be overriden from the
> > kernel command line if we get it wrong)?
>
> Yes, and then our behaviour differs from Windows

How so? Windows don't work on those older Macs as well, do they?

So if we properly detect those (and only those), we mimic Windows
completely, right?

--
Jiri Kosina
SUSE Labs

2013-06-20 21:27:16

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On 06/19/2013 09:59 AM, Borislav Petkov wrote:
> On Wed, Jun 19, 2013 at 05:48:22PM +0100, Matthew Garrett wrote:
>>> Ok, so it sounds like we want to *always* create both mappings but,
>>> depending on what we want, to shove down SetVirtualAddressMap a
>>> different set. And the 1:1 map will be the optional one which we give
>>> SetVirtualAddressMap only when user wants it, i.e. when booting with
>>> "efi=1:1_map".
>>
>> Yup, I think that sounds ideal.
>
> Crap, I got completely sidetracked. The 1:1 mappings go in a different
> pagetable (real_mode_header->trampoline_pgd) than the kernel one
> (i.e. init_mm.pgd). However, the ->trampoline_pgd has all mappings
> anyway, which means that if we want to do EFI runtime calls with the
> high mappings but *also* have the 1:1 mappings established, we should
> *always* switch to that pagetable when doing those calls.
>
> hpa, MattF, agreed?
>

Yes.

-hpa

2013-06-20 22:36:06

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On 06/20/2013 11:47 AM, Borislav Petkov wrote:
>
> I guess we can do a top-down allocation, starting from the highest
> virtual addresses:
>
> EFI_HIGHEST_ADDRESS
> |
> | size1
> |
> --> region1
> |
> | size2
> |
> --> region2
>
> ...
>
> and we make EFI_HIGHEST_ADDRESS be the same absolute number on every
> system.
>
> hpa, is this close to what you had in mind? It would be prudent to
> verify whether this will suit well with the kexec virtual space layout
> though...
>

This would work really well, I think. The tricky part here is to pick a
safe EFI_HIGHEST_ADDRESS as it is an ABI.

My preference would be to make EFI_HIGHEST_ADDRESS = -4 GB, which is
*not* what Windows uses, but will leave the high negative range clear,
and allows a range where we can grow down without much risk of
interfering with anything else.

-hpa

2013-06-20 23:43:44

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH] [IA64] sim: Add casts to avoid assignment warnings

Oops - pasted in old e-mail address for Boris

On Thu, Jun 20, 2013 at 11:15 AM, Luck, Tony <[email protected]> wrote:
> Pointers in the efi_runtime_services_t structure now have type
> "void *" (formerly they were "unsigned long"). So we now see a
> bunch of warnings like this:
>
> arch/ia64/hp/sim/boot/fw-emu.c:293: warning: assignment makes pointer from integer without a cast
>
> Add (void *) casts to the 10 affected lines to make the build quiet again.
>
> Signed-off-by: Tony Luck <[email protected]>
>
> ---
>
> Boris, Matt - Can you add this patch to the same tree that
>
> commit 43ab0476a648053e5998bf081f47f215375a4502 [linux-next id]
> efi: Convert runtime services function ptrs
>
> is in so that it will follow along behind it. Thanks.
>
> arch/ia64/hp/sim/boot/fw-emu.c | 20 ++++++++++----------
> 1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/arch/ia64/hp/sim/boot/fw-emu.c b/arch/ia64/hp/sim/boot/fw-emu.c
> index 271f412..87bf9ad 100644
> --- a/arch/ia64/hp/sim/boot/fw-emu.c
> +++ b/arch/ia64/hp/sim/boot/fw-emu.c
> @@ -290,16 +290,16 @@ sys_fw_init (const char *args, int arglen)
> efi_runtime->hdr.signature = EFI_RUNTIME_SERVICES_SIGNATURE;
> efi_runtime->hdr.revision = EFI_RUNTIME_SERVICES_REVISION;
> efi_runtime->hdr.headersize = sizeof(efi_runtime->hdr);
> - efi_runtime->get_time = __pa(&fw_efi_get_time);
> - efi_runtime->set_time = __pa(&efi_unimplemented);
> - efi_runtime->get_wakeup_time = __pa(&efi_unimplemented);
> - efi_runtime->set_wakeup_time = __pa(&efi_unimplemented);
> - efi_runtime->set_virtual_address_map = __pa(&efi_unimplemented);
> - efi_runtime->get_variable = __pa(&efi_unimplemented);
> - efi_runtime->get_next_variable = __pa(&efi_unimplemented);
> - efi_runtime->set_variable = __pa(&efi_unimplemented);
> - efi_runtime->get_next_high_mono_count = __pa(&efi_unimplemented);
> - efi_runtime->reset_system = __pa(&efi_reset_system);
> + efi_runtime->get_time = (void *)__pa(&fw_efi_get_time);
> + efi_runtime->set_time = (void *)__pa(&efi_unimplemented);
> + efi_runtime->get_wakeup_time = (void *)__pa(&efi_unimplemented);
> + efi_runtime->set_wakeup_time = (void *)__pa(&efi_unimplemented);
> + efi_runtime->set_virtual_address_map = (void *)__pa(&efi_unimplemented);
> + efi_runtime->get_variable = (void *)__pa(&efi_unimplemented);
> + efi_runtime->get_next_variable = (void *)__pa(&efi_unimplemented);
> + efi_runtime->set_variable = (void *)__pa(&efi_unimplemented);
> + efi_runtime->get_next_high_mono_count = (void *)__pa(&efi_unimplemented);
> + efi_runtime->reset_system = (void *)__pa(&efi_reset_system);
>
> efi_tables->guid = SAL_SYSTEM_TABLE_GUID;
> efi_tables->table = __pa(sal_systab);
> --
> 1.8.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2013-06-21 06:34:48

by Matt Fleming

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Wed, 19 Jun, at 06:59:02PM, Borislav Petkov wrote:
> On Wed, Jun 19, 2013 at 05:48:22PM +0100, Matthew Garrett wrote:
> > > Ok, so it sounds like we want to *always* create both mappings but,
> > > depending on what we want, to shove down SetVirtualAddressMap a
> > > different set. And the 1:1 map will be the optional one which we give
> > > SetVirtualAddressMap only when user wants it, i.e. when booting with
> > > "efi=1:1_map".
> >
> > Yup, I think that sounds ideal.
>
> Crap, I got completely sidetracked. The 1:1 mappings go in a different
> pagetable (real_mode_header->trampoline_pgd) than the kernel one
> (i.e. init_mm.pgd). However, the ->trampoline_pgd has all mappings
> anyway, which means that if we want to do EFI runtime calls with the
> high mappings but *also* have the 1:1 mappings established, we should
> *always* switch to that pagetable when doing those calls.
>
> hpa, MattF, agreed?

Yep.

--
Matt Fleming, Intel Open Source Technology Center

2013-06-21 07:24:10

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Thu, Jun 20, 2013 at 03:35:24PM -0700, H. Peter Anvin wrote:
> On 06/20/2013 11:47 AM, Borislav Petkov wrote:
> >
> > I guess we can do a top-down allocation, starting from the highest
> > virtual addresses:
> >
> > EFI_HIGHEST_ADDRESS
> > |
> > | size1
> > |
> > --> region1
> > |
> > | size2
> > |
> > --> region2
> >
> > ...
> >
> > and we make EFI_HIGHEST_ADDRESS be the same absolute number on every
> > system.
> >
> > hpa, is this close to what you had in mind? It would be prudent to
> > verify whether this will suit well with the kexec virtual space layout
> > though...
> >
>
> This would work really well, I think. The tricky part here is to pick a
> safe EFI_HIGHEST_ADDRESS as it is an ABI.
>
> My preference would be to make EFI_HIGHEST_ADDRESS = -4 GB, which is
> *not* what Windows uses, but will leave the high negative range clear,
> and allows a range where we can grow down without much risk of
> interfering with anything else.

Hmm, cool. Let me see whether my primitive math still has it:

-(4 << 30) = 0xffffffff00000000.

Staring at Documentation/x86/x86_64/mm.txt, that's right in the unused
hole, sandwiched between:

ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)

xxxxxxxxxxxxxxxx - ffffffff00000000 (=XX bits, not a lot :-), maybe 4, i.e. 64G) EFI

ffffffff80000000 - ffffffffa0000000 (=512 MB) kernel text mapping, from phys 0

Now, if we go and do that, what are we going to say for the lower bound,
in case later someone wants to use some more of the rest of the unused
hole? Should we limit it to say

0xffffffff00000000 -
0xfffffff000000000 = 64G max EFI mappable region.

Or am I too generous? The remaining hole is around

(0xfffffff000000000 - 0xffffeaffffffffff) >> 40 = 20TB.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-06-21 10:06:06

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

If you cap it you are basically imposing a constraint on the firmware and may not run properly (or at least have to turn off EFI runtime calls with all that implies.) It might be good to have a sanity check but it needs to be pretty generous.

Borislav Petkov <[email protected]> wrote:

>On Thu, Jun 20, 2013 at 03:35:24PM -0700, H. Peter Anvin wrote:
>> On 06/20/2013 11:47 AM, Borislav Petkov wrote:
>> >
>> > I guess we can do a top-down allocation, starting from the highest
>> > virtual addresses:
>> >
>> > EFI_HIGHEST_ADDRESS
>> > |
>> > | size1
>> > |
>> > --> region1
>> > |
>> > | size2
>> > |
>> > --> region2
>> >
>> > ...
>> >
>> > and we make EFI_HIGHEST_ADDRESS be the same absolute number on
>every
>> > system.
>> >
>> > hpa, is this close to what you had in mind? It would be prudent to
>> > verify whether this will suit well with the kexec virtual space
>layout
>> > though...
>> >
>>
>> This would work really well, I think. The tricky part here is to
>pick a
>> safe EFI_HIGHEST_ADDRESS as it is an ABI.
>>
>> My preference would be to make EFI_HIGHEST_ADDRESS = -4 GB, which is
>> *not* what Windows uses, but will leave the high negative range
>clear,
>> and allows a range where we can grow down without much risk of
>> interfering with anything else.
>
>Hmm, cool. Let me see whether my primitive math still has it:
>
>-(4 << 30) = 0xffffffff00000000.
>
>Staring at Documentation/x86/x86_64/mm.txt, that's right in the unused
>hole, sandwiched between:
>
>ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
>
>xxxxxxxxxxxxxxxx - ffffffff00000000 (=XX bits, not a lot :-), maybe 4,
>i.e. 64G) EFI
>
>ffffffff80000000 - ffffffffa0000000 (=512 MB) kernel text mapping,
>from phys 0
>
>Now, if we go and do that, what are we going to say for the lower
>bound,
>in case later someone wants to use some more of the rest of the unused
>hole? Should we limit it to say
>
>0xffffffff00000000 -
>0xfffffff000000000 = 64G max EFI mappable region.
>
>Or am I too generous? The remaining hole is around
>
>(0xfffffff000000000 - 0xffffeaffffffffff) >> 40 = 20TB.
>
>Thanks.

--
Sent from my mobile phone. Please excuse brevity and lack of formatting.

2013-06-21 10:26:03

by Matt Fleming

[permalink] [raw]
Subject: Re: [PATCH] [IA64] sim: Add casts to avoid assignment warnings

On 21 June 2013 00:43, Tony Luck <[email protected]> wrote:
> Oops - pasted in old e-mail address for Boris

Applied, thanks Tony.

2013-06-21 11:56:32

by Matt Fleming

[permalink] [raw]
Subject: Re: [PATCH -v2 4/4] x86, efi: Map runtime services 1:1

On Mon, 17 Jun, at 07:50:16PM, Borislav Petkov wrote:
> From: Borislav Petkov <[email protected]>
>
> Due to the braindead design of EFI, we cannot map runtime services more
> than once for the duration of a booted system. Thus, if we want to use
> EFI runtime services in a kexec'ed kernel, maybe the only possible and
> sensible approach would be to map them 1:1 so that when the kexec kernel
> loads, it can simply call those addresses without the need for remapping
> (which doesn't work anyway).
>
> Furthermore, this mapping approach could be of help with b0rked EFI
> implementations for a different set of reasons.
>
> This implementation is 64-bit only for now.
>
> Signed-off-by: Borislav Petkov <[email protected]>
> ---
> arch/x86/include/asm/efi.h | 67 +++++++++++----
> arch/x86/platform/efi/efi.c | 165 +++++++++++++++++++++++++++++-------
> arch/x86/platform/efi/efi_stub_64.S | 56 ++++++++++++
> 3 files changed, 240 insertions(+), 48 deletions(-)

[...]

> + .macro SWITCH_PGT
> + cmpb $0, efi_use_11_map
> + je 1f;

Actually, this needs to be,

cmpb $0, efi_use_11_map(%rip)

because this code is built into the EFI boot stub which isn't loaded at
a fixed address and needs to be position independent.

--
Matt Fleming, Intel Open Source Technology Center

2013-06-21 14:21:14

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Fri, Jun 21, 2013 at 03:05:30AM -0700, H. Peter Anvin wrote:
> If you cap it you are basically imposing a constraint on the firmware
> and may not run properly (or at least have to turn off EFI runtime
> calls with all that implies.)

I don't want to cap EFI just for the fun of it but rather set a limit
so that the next one who wants a chunk of the virtual address space can
have a reliable limit from where she/he can start. Otherwise we won't
know where EFI reliably ends...

> It might be good to have a sanity check but it needs to be pretty
> generous.

64 Gb generous enough?

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-06-21 16:43:10

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On 06/21/2013 07:21 AM, Borislav Petkov wrote:
> On Fri, Jun 21, 2013 at 03:05:30AM -0700, H. Peter Anvin wrote:
>> If you cap it you are basically imposing a constraint on the firmware
>> and may not run properly (or at least have to turn off EFI runtime
>> calls with all that implies.)
>
> I don't want to cap EFI just for the fun of it but rather set a limit
> so that the next one who wants a chunk of the virtual address space can
> have a reliable limit from where she/he can start. Otherwise we won't
> know where EFI reliably ends...
>

We don't... and I don't think there is anything we can do about it. If
some messed-up firmware wants to map a terabyte we either refuse and
don't allow EFI runtime calls on that machine or we accept it.

Fortunately the space is extremely large and with growing down from a
known address it is less likely that we'll conflict with something.

>> It might be good to have a sanity check but it needs to be pretty
>> generous.
>
> 64 Gb generous enough?

Let's start there and see if we run into something that gives us trouble.

-hpa

2013-07-02 07:29:57

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH -v2 0/4] EFI 1:1 mapping

On Fri 2013-06-21 09:42:35, H. Peter Anvin wrote:
> On 06/21/2013 07:21 AM, Borislav Petkov wrote:
> > On Fri, Jun 21, 2013 at 03:05:30AM -0700, H. Peter Anvin wrote:
> >> If you cap it you are basically imposing a constraint on the firmware
> >> and may not run properly (or at least have to turn off EFI runtime
> >> calls with all that implies.)
> >
> > I don't want to cap EFI just for the fun of it but rather set a limit
> > so that the next one who wants a chunk of the virtual address space can
> > have a reliable limit from where she/he can start. Otherwise we won't
> > know where EFI reliably ends...
> >
>
> We don't... and I don't think there is anything we can do about it. If
> some messed-up firmware wants to map a terabyte we either refuse and
> don't allow EFI runtime calls on that machine or we accept it.
>
> Fortunately the space is extremely large and with growing down from a
> known address it is less likely that we'll conflict with something.

Hmm, will not setting up huge areas be slow?

Or we are going to use 2G pages or something?

Or should we make it slow on purpose so that at least server
vendors are discouraged from using big areas? :-)

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-07-03 06:19:36

by joeyli

[permalink] [raw]
Subject: Re: [PATCH -v2 4/4] x86, efi: Map runtime services 1:1

於 一,2013-06-17 於 19:50 +0200,Borislav Petkov 提到:
> +#ifdef CONFIG_X86_64
> + efi_scratch.pgt11 = (pgd_t *)(unsigned long)real_mode_header->trampoline_pgd;
> +#endif
> + efi_use_11_map = true;
> +

I think "efi_use_11_map = true" should moved to before "#endif",
otherwise kernel will build fail on i586 architecture.

I got build error:
arch/x86/platform/efi/efi.c:1074: error: ‘efi_use_11_map’ undeclared (first use in this function)
arch/x86/platform/efi/efi.c:1074: error: (Each undeclared identifier is reported only once
arch/x86/platform/efi/efi.c:1074: error: for each function it appears in.)


Thanks a lot!
Joey Lee