2018-10-22 01:37:31

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: [PATCH 0/2] Unmap efi boot services code/data regions after boot.

CC'ing x86 folks because this patch touches x86/mm which I am no expert of.

[Copied from Patch 1]
Ideally, after kernel assumes control of the platform firmware shouldn't
access EFI Boot Services Code/Data regions. But, it's noticed that this
is not so true in many x86 platforms. Hence, during boot, kernel
reserves efi boot services code/data regions [1] and maps [2] them to
efi_pgd so that call to set_virtual_address_map() doesn't fail. After
returning from set_virtual_address_map(), kernel frees the reserved
regions [3] but they still remain mapped.

This means that any code that's running in efi_pgd address space (e.g:
any efi runtime service) would still be able to access efi boot services
code/data regions but the contents of these regions would have long been
over written by someone else as they are freed by efi_free_boot_services().
So, it's important to unmap these regions. After unmapping boot services
code/data regions, any illegal access by buggy firmware to these regions
would result in page fault which will be handled by efi specific fault
handler.

[1] Please see efi_reserve_boot_services()
[2] Please see efi_map_region() -> __map_region()
[3] Please see efi_free_boot_services()

Testing the patch set:
----------------------
1. Download buggy firmware (which accesses boot regions even after kernel has
booted) from here [1].
2. Without the patch set, you shouldn't see any kernel warning/error
messages (i.e. kernel allows accesses to efi boot services code/data
regions even after call to set_virtual_address_map()).
3. With the patch set, you should see a kernel warning about buggy
firmware, efi_rts_wq beeing freezed and disabling runtime services forever.

Please note that this patch will change kernel's existing behavior for
some efi runtime services but I think it's OK because kernel should have
never allowed those accesses in the first place.

Also please note that this patch set needs lot of real time trashing as
I just tested it out with OVMF.

Note:
-----
Patch set based on "next" branch in efi tree.

[1] https://drive.google.com/drive/folders/1VozKTms92ifyVHAT0ZDQe55ZYL1UE5wt

Sai Praneeth (2):
x86/efi: Unmap efi boot services code/data regions from efi_pgd
x86/efi: Move efi_<reserve/free>_boot_services() to arch/x86

arch/x86/include/asm/efi.h | 2 ++
arch/x86/include/asm/pgtable_types.h | 2 ++
arch/x86/mm/pageattr.c | 21 +++++++++++++++++++++
arch/x86/platform/efi/efi.c | 2 ++
arch/x86/platform/efi/quirks.c | 26 ++++++++++++++++++++++++++
include/linux/efi.h | 3 ---
init/main.c | 4 ----
7 files changed, 53 insertions(+), 7 deletions(-)

Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>

--
2.7.4



2018-10-22 01:37:32

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: [PATCH 2/2] x86/efi: Move efi_<reserve/free>_boot_services() to arch/x86

efi_<reserve/free>_boot_services() are x86 specific quirks and as such
should be in asm/efi.h, so move them from linux/efi.h. Also, call
efi_free_boot_services() from __efi_enter_virtual_mode() as it is x86
specific call and ideally shouldn't be part of init/main.c

Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
---
arch/x86/include/asm/efi.h | 2 ++
arch/x86/platform/efi/efi.c | 2 ++
include/linux/efi.h | 3 ---
init/main.c | 4 ----
4 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index eea40d52ca78..d1e64ac80b9c 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -141,6 +141,8 @@ extern int __init efi_reuse_config(u64 tables, int nr_tables);
extern void efi_delete_dummy_variable(void);
extern void efi_switch_mm(struct mm_struct *mm);
extern void efi_recover_from_page_fault(unsigned long phys_addr);
+extern void efi_free_boot_services(void);
+extern void efi_reserve_boot_services(void);

struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..93924a353e3b 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -994,6 +994,8 @@ static void __init __efi_enter_virtual_mode(void)
panic("EFI call to SetVirtualAddressMap() failed!");
}

+ efi_free_boot_services();
+
/*
* Now that EFI is in virtual mode, update the function
* pointers in the runtime service table to the new virtual addresses.
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 845174e113ce..ed2058073385 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1000,13 +1000,11 @@ extern void efi_memmap_walk (efi_freemem_callback_t callback, void *arg);
extern void efi_gettimeofday (struct timespec64 *ts);
extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, if possible */
#ifdef CONFIG_X86
-extern void efi_free_boot_services(void);
extern efi_status_t efi_query_variable_store(u32 attributes,
unsigned long size,
bool nonblocking);
extern void efi_find_mirror(void);
#else
-static inline void efi_free_boot_services(void) {}

static inline efi_status_t efi_query_variable_store(u32 attributes,
unsigned long size,
@@ -1046,7 +1044,6 @@ extern void efi_mem_reserve(phys_addr_t addr, u64 size);
extern int efi_mem_reserve_persistent(phys_addr_t addr, u64 size);
extern void efi_initialize_iomem_resources(struct resource *code_resource,
struct resource *data_resource, struct resource *bss_resource);
-extern void efi_reserve_boot_services(void);
extern int efi_get_fdt_params(struct efi_fdt_params *params);
extern struct kobject *efi_kobj;

diff --git a/init/main.c b/init/main.c
index 18f8f0140fa0..174fb14196cc 100644
--- a/init/main.c
+++ b/init/main.c
@@ -731,10 +731,6 @@ asmlinkage __visible void __init start_kernel(void)
arch_post_acpi_subsys_init();
sfi_init_late();

- if (efi_enabled(EFI_RUNTIME_SERVICES)) {
- efi_free_boot_services();
- }
-
/* Do the rest non-__init'ed, we're now alive */
rest_init();
}
--
2.7.4


2018-10-22 01:37:44

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: [PATCH 1/2] x86/efi: Unmap efi boot services code/data regions from efi_pgd

Ideally, after kernel assumes control of the platform firmware shouldn't
access EFI Boot Services Code/Data regions. But, it's noticed that this
is not so true in many x86 platforms. Hence, during boot, kernel
reserves efi boot services code/data regions [1] and maps [2] them to
efi_pgd so that call to set_virtual_address_map() doesn't fail. After
returning from set_virtual_address_map(), kernel frees the reserved
regions [3] but they still remain mapped.

This means that any code that's running in efi_pgd address space (e.g:
any efi runtime service) would still be able to access efi boot services
code/data regions but the contents of these regions would have long been
over written by someone else as they are freed by efi_free_boot_services().
So, it's important to unmap these regions. After unmapping boot services
code/data regions, any illegal access by buggy firmware to these regions
would result in page fault which will be handled by efi specific fault
handler.

[1] Please see efi_reserve_boot_services()
[2] Please see efi_map_region() -> __map_region()
[3] Please see efi_free_boot_services()

Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
---
arch/x86/include/asm/pgtable_types.h | 2 ++
arch/x86/mm/pageattr.c | 21 +++++++++++++++++++++
arch/x86/platform/efi/quirks.c | 26 ++++++++++++++++++++++++++
3 files changed, 49 insertions(+)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index b64acb08a62b..796476f11151 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -566,6 +566,8 @@ extern pmd_t *lookup_pmd_address(unsigned long address);
extern phys_addr_t slow_virt_to_phys(void *__address);
extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
unsigned numpages, unsigned long page_flags);
+extern int kernel_unmap_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
+ unsigned long numpages);
#endif /* !__ASSEMBLY__ */

#endif /* _ASM_X86_PGTABLE_DEFS_H */
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 51a5a69ecac9..b88ed8e91790 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -2147,6 +2147,27 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
return retval;
}

+int kernel_unmap_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
+ unsigned long numpages)
+{
+ int retval;
+
+ struct cpa_data cpa = {
+ .vaddr = &address,
+ .pfn = pfn,
+ .pgd = pgd,
+ .numpages = numpages,
+ .mask_set = __pgprot(0),
+ .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW),
+ .flags = 0,
+ };
+
+ retval = __change_page_attr_set_clr(&cpa, 0);
+ __flush_tlb_all();
+
+ return retval;
+}
+
/*
* The testcases use internal knowledge of the implementation that shouldn't
* be exposed to the rest of the kernel. Include these directly here.
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 669babcaf245..5a1ee9392fcf 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -370,6 +370,25 @@ void __init efi_reserve_boot_services(void)
}
}

+/*
+ * Apart from having VA mappings for efi boot services code/data regions,
+ * (duplicate) 1:1 mappings were also created as a catch for buggy firmware. So,
+ * unmap both 1:1 and VA mappings.
+ */
+static void __init efi_unmap_pages(efi_memory_desc_t *md)
+{
+ pgd_t *pgd = efi_mm.pgd;
+ u64 pfn = md->phys_addr >> PAGE_SHIFT;
+
+ if (kernel_unmap_pages_in_pgd(pgd, pfn, md->phys_addr, md->num_pages))
+ pr_err("Failed to unmap 1:1 mapping: PA 0x%llx -> VA 0x%llx!\n",
+ md->phys_addr, md->virt_addr);
+
+ if (kernel_unmap_pages_in_pgd(pgd, pfn, md->virt_addr, md->num_pages))
+ pr_err("Failed to unmap VA mapping: PA 0x%llx -> VA 0x%llx!\n",
+ md->phys_addr, md->virt_addr);
+}
+
void __init efi_free_boot_services(void)
{
phys_addr_t new_phys, new_size;
@@ -415,6 +434,13 @@ void __init efi_free_boot_services(void)
}

free_bootmem_late(start, size);
+
+ /*
+ * Before calling set_virtual_address_map(), boot services
+ * code/data regions were mapped as a catch for buggy firmware.
+ * Unmap them from efi_pgd as they have already been freed.
+ */
+ efi_unmap_pages(md);
}

if (!num_entries)
--
2.7.4


2018-10-22 02:13:26

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/efi: Unmap efi boot services code/data regions from efi_pgd


* Sai Praneeth Prakhya <[email protected]> wrote:

> Ideally, after kernel assumes control of the platform firmware shouldn't
> access EFI Boot Services Code/Data regions. But, it's noticed that this
> is not so true in many x86 platforms. Hence, during boot, kernel
> reserves efi boot services code/data regions [1] and maps [2] them to
> efi_pgd so that call to set_virtual_address_map() doesn't fail. After
> returning from set_virtual_address_map(), kernel frees the reserved
> regions [3] but they still remain mapped.
>
> This means that any code that's running in efi_pgd address space (e.g:
> any efi runtime service) would still be able to access efi boot services
> code/data regions but the contents of these regions would have long been
> over written by someone else as they are freed by efi_free_boot_services().
> So, it's important to unmap these regions. After unmapping boot services
> code/data regions, any illegal access by buggy firmware to these regions
> would result in page fault which will be handled by efi specific fault
> handler.
>
> [1] Please see efi_reserve_boot_services()
> [2] Please see efi_map_region() -> __map_region()
> [3] Please see efi_free_boot_services()
>
> Signed-off-by: Sai Praneeth Prakhya <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Bhupesh Sharma <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> ---
> arch/x86/include/asm/pgtable_types.h | 2 ++
> arch/x86/mm/pageattr.c | 21 +++++++++++++++++++++
> arch/x86/platform/efi/quirks.c | 26 ++++++++++++++++++++++++++
> 3 files changed, 49 insertions(+)
>
> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> index b64acb08a62b..796476f11151 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -566,6 +566,8 @@ extern pmd_t *lookup_pmd_address(unsigned long address);
> extern phys_addr_t slow_virt_to_phys(void *__address);
> extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
> unsigned numpages, unsigned long page_flags);
> +extern int kernel_unmap_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
> + unsigned long numpages);
> #endif /* !__ASSEMBLY__ */
>
> #endif /* _ASM_X86_PGTABLE_DEFS_H */
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index 51a5a69ecac9..b88ed8e91790 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -2147,6 +2147,27 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
> return retval;
> }
>
> +int kernel_unmap_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
> + unsigned long numpages)
> +{
> + int retval;
> +
> + struct cpa_data cpa = {
> + .vaddr = &address,
> + .pfn = pfn,
> + .pgd = pgd,
> + .numpages = numpages,
> + .mask_set = __pgprot(0),
> + .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW),
> + .flags = 0,
> + };
> +
> + retval = __change_page_attr_set_clr(&cpa, 0);
> + __flush_tlb_all();
> +
> + return retval;
> +}

That's certainly a creative use of __change_page_attr_set_clr() by EFI
used for mapping in pages so far (kernel_map_pages_in_pgd()), and now
used for unmapping as well. Doesn't look wrong, just a bit weird as part
of CPA.

Could you please write the initializer in an easier to read fashion:

struct cpa_data cpa = {
.vaddr = &address,
.pfn = pfn,
.pgd = pgd,
.numpages = numpages,
.mask_set = __pgprot(0),
.mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW),
.flags = 0,
};

?

The one bit that is odd is the cpa->pfn field - for unmapped pages that's
totally uninteresting and I'm wondering whether setting it to 0 wouldn't
be better.

Does the CPU _ever_ look look at the PFN if the page is !_PAGE_PRESENT,
for example speculatively? If yes then what is the recommended value for
the pfn - zero perhaps?

Also note that if for whatever reason the PFN range of the EFI boot area
gets hot-unplugged, we'd have outright invalid PFNs - although this is
probably very unlikely from a platform perspective.

> +/*
> + * Apart from having VA mappings for efi boot services code/data regions,
> + * (duplicate) 1:1 mappings were also created as a catch for buggy firmware. So,
> + * unmap both 1:1 and VA mappings.
> + */

Speling nits:

- please capitalize 'EFI' consistently.
- s/catch/quirk ?

BTW., are the 1:1 'boot mappings' a buggy firmware quirk, or something
required by the EFI spec? (or both? ;-)

> +static void __init efi_unmap_pages(efi_memory_desc_t *md)
> +{
> + pgd_t *pgd = efi_mm.pgd;
> + u64 pfn = md->phys_addr >> PAGE_SHIFT;

Note that this md->phys_addr isn't really meaningful once it gets
unmapped.

> +
> + if (kernel_unmap_pages_in_pgd(pgd, pfn, md->phys_addr, md->num_pages))
> + pr_err("Failed to unmap 1:1 mapping: PA 0x%llx -> VA 0x%llx!\n",
> + md->phys_addr, md->virt_addr);
> +
> + if (kernel_unmap_pages_in_pgd(pgd, pfn, md->virt_addr, md->num_pages))
> + pr_err("Failed to unmap VA mapping: PA 0x%llx -> VA 0x%llx!\n",
> + md->phys_addr, md->virt_addr);

Please keep pr_err()'s in a single line. (and ignore checkpatch.)

> +}
> +
> void __init efi_free_boot_services(void)
> {
> phys_addr_t new_phys, new_size;
> @@ -415,6 +434,13 @@ void __init efi_free_boot_services(void)
> }
>
> free_bootmem_late(start, size);
> +
> + /*
> + * Before calling set_virtual_address_map(), boot services
> + * code/data regions were mapped as a catch for buggy firmware.
> + * Unmap them from efi_pgd as they have already been freed.
> + */
> + efi_unmap_pages(md);

Ditto.

BTW., the ordering here is wrong: we should unmap any virtual aliases
from pagetables _before_ we free the underlying memory. The ordering is
probably harmless in this case but overall a good practice.

Thanks,

Ingo

2018-10-22 03:01:29

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH 1/2] x86/efi: Unmap efi boot services code/data regions from efi_pgd

> > +int kernel_unmap_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
> > + unsigned long numpages)
> > +{
> > + int retval;
> > +
> > + struct cpa_data cpa = {
> > + .vaddr = &address,
> > + .pfn = pfn,
> > + .pgd = pgd,
> > + .numpages = numpages,
> > + .mask_set = __pgprot(0),
> > + .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW),
> > + .flags = 0,
> > + };
> > +
> > + retval = __change_page_attr_set_clr(&cpa, 0);
> > + __flush_tlb_all();
> > +
> > + return retval;
> > +}
>
> That's certainly a creative use of __change_page_attr_set_clr() by EFI used for
> mapping in pages so far (kernel_map_pages_in_pgd()), and now used for
> unmapping as well. Doesn't look wrong, just a bit weird as part of CPA.
>

Haha.. yes.. I copied from kernel_map_pages_in_pgd()

> Could you please write the initializer in an easier to read fashion:
>
> struct cpa_data cpa = {
> .vaddr = &address,
> .pfn = pfn,
> .pgd = pgd,
> .numpages = numpages,
> .mask_set = __pgprot(0),
> .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW),
> .flags = 0,
> };
>
> ?

Sure!

>
> The one bit that is odd is the cpa->pfn field - for unmapped pages that's totally
> uninteresting and I'm wondering whether setting it to 0 wouldn't be better.
>
> Does the CPU _ever_ look look at the PFN if the page is !_PAGE_PRESENT, for
> example speculatively? If yes then what is the recommended value for the pfn -
> zero perhaps?
>
> Also note that if for whatever reason the PFN range of the EFI boot area gets
> hot-unplugged, we'd have outright invalid PFNs - although this is probably very
> unlikely from a platform perspective.
>
> > +/*
> > + * Apart from having VA mappings for efi boot services code/data
> > +regions,
> > + * (duplicate) 1:1 mappings were also created as a catch for buggy
> > +firmware. So,
> > + * unmap both 1:1 and VA mappings.
> > + */
>
> Speling nits:
>
> - please capitalize 'EFI' consistently.
> - s/catch/quirk ?
>

Sure! I will fix them

> BTW., are the 1:1 'boot mappings' a buggy firmware quirk, or something
> required by the EFI spec? (or both? ;-)
>

It's a quirk for buggy firmware.
According to EFI spec, EFI Boot Services code/data regions shouldn't be accessed
after calling exit_boot_services(). This call is typically performed by bootloader
(grub) or efi_stub.

> > +static void __init efi_unmap_pages(efi_memory_desc_t *md) {
> > + pgd_t *pgd = efi_mm.pgd;
> > + u64 pfn = md->phys_addr >> PAGE_SHIFT;
>
> Note that this md->phys_addr isn't really meaningful once it gets unmapped.
>

Yes, makes sense. In efi_free_boot_services(), after freeing up the memory and
unmapping, a new memory map is created (which has only EFI Runtime regions)
and hence we can safely assume that this memory descriptor and md->phys_addr
would never be used.

> > +
> > + if (kernel_unmap_pages_in_pgd(pgd, pfn, md->phys_addr, md-
> >num_pages))
> > + pr_err("Failed to unmap 1:1 mapping: PA 0x%llx -> VA
> 0x%llx!\n",
> > + md->phys_addr, md->virt_addr);
> > +
> > + if (kernel_unmap_pages_in_pgd(pgd, pfn, md->virt_addr, md-
> >num_pages))
> > + pr_err("Failed to unmap VA mapping: PA 0x%llx -> VA
> 0x%llx!\n",
> > + md->phys_addr, md->virt_addr);
>
> Please keep pr_err()'s in a single line. (and ignore checkpatch.)
>

Sure!

> > +}
> > +
> > void __init efi_free_boot_services(void) {
> > phys_addr_t new_phys, new_size;
> > @@ -415,6 +434,13 @@ void __init efi_free_boot_services(void)
> > }
> >
> > free_bootmem_late(start, size);
> > +
> > + /*
> > + * Before calling set_virtual_address_map(), boot services
> > + * code/data regions were mapped as a catch for buggy
> firmware.
> > + * Unmap them from efi_pgd as they have already been freed.
> > + */
> > + efi_unmap_pages(md);
>
> Ditto.
>
> BTW., the ordering here is wrong: we should unmap any virtual aliases from
> pagetables _before_ we free the underlying memory. The ordering is probably
> harmless in this case but overall a good practice.

Sure! Makes sense. I will fix it in V2.

Regards,
Sai

2018-10-22 05:00:44

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/efi: Unmap efi boot services code/data regions from efi_pgd

On Sun, Oct 21, 2018 at 6:57 PM Ingo Molnar <[email protected]> wrote:
>
>
> * Sai Praneeth Prakhya <[email protected]> wrote:
>
> > Ideally, after kernel assumes control of the platform firmware shouldn't
> > access EFI Boot Services Code/Data regions. But, it's noticed that this
> > is not so true in many x86 platforms. Hence, during boot, kernel
> > reserves efi boot services code/data regions [1] and maps [2] them to
> > efi_pgd so that call to set_virtual_address_map() doesn't fail. After
> > returning from set_virtual_address_map(), kernel frees the reserved
> > regions [3] but they still remain mapped.
> >
> > This means that any code that's running in efi_pgd address space (e.g:
> > any efi runtime service) would still be able to access efi boot services
> > code/data regions but the contents of these regions would have long been
> > over written by someone else as they are freed by efi_free_boot_services().
> > So, it's important to unmap these regions. After unmapping boot services
> > code/data regions, any illegal access by buggy firmware to these regions
> > would result in page fault which will be handled by efi specific fault
> > handler.
> >
> > [1] Please see efi_reserve_boot_services()
> > [2] Please see efi_map_region() -> __map_region()
> > [3] Please see efi_free_boot_services()
> >
> > Signed-off-by: Sai Praneeth Prakhya <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Andy Lutomirski <[email protected]>
> > Cc: Dave Hansen <[email protected]>
> > Cc: Bhupesh Sharma <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Ard Biesheuvel <[email protected]>
> > ---
> > arch/x86/include/asm/pgtable_types.h | 2 ++
> > arch/x86/mm/pageattr.c | 21 +++++++++++++++++++++
> > arch/x86/platform/efi/quirks.c | 26 ++++++++++++++++++++++++++
> > 3 files changed, 49 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> > index b64acb08a62b..796476f11151 100644
> > --- a/arch/x86/include/asm/pgtable_types.h
> > +++ b/arch/x86/include/asm/pgtable_types.h
> > @@ -566,6 +566,8 @@ extern pmd_t *lookup_pmd_address(unsigned long address);
> > extern phys_addr_t slow_virt_to_phys(void *__address);
> > extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
> > unsigned numpages, unsigned long page_flags);
> > +extern int kernel_unmap_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
> > + unsigned long numpages);
> > #endif /* !__ASSEMBLY__ */
> >
> > #endif /* _ASM_X86_PGTABLE_DEFS_H */
> > diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> > index 51a5a69ecac9..b88ed8e91790 100644
> > --- a/arch/x86/mm/pageattr.c
> > +++ b/arch/x86/mm/pageattr.c
> > @@ -2147,6 +2147,27 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
> > return retval;
> > }
> >
> > +int kernel_unmap_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
> > + unsigned long numpages)
> > +{
> > + int retval;
> > +
> > + struct cpa_data cpa = {
> > + .vaddr = &address,
> > + .pfn = pfn,
> > + .pgd = pgd,
> > + .numpages = numpages,
> > + .mask_set = __pgprot(0),
> > + .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW),
> > + .flags = 0,
> > + };
> > +
> > + retval = __change_page_attr_set_clr(&cpa, 0);
> > + __flush_tlb_all();
> > +
> > + return retval;
> > +}
>
> That's certainly a creative use of __change_page_attr_set_clr() by EFI
> used for mapping in pages so far (kernel_map_pages_in_pgd()), and now
> used for unmapping as well. Doesn't look wrong, just a bit weird as part
> of CPA.
>
> Could you please write the initializer in an easier to read fashion:
>
> struct cpa_data cpa = {
> .vaddr = &address,
> .pfn = pfn,
> .pgd = pgd,
> .numpages = numpages,
> .mask_set = __pgprot(0),
> .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW),
> .flags = 0,
> };
>
> ?
>
> The one bit that is odd is the cpa->pfn field - for unmapped pages that's
> totally uninteresting and I'm wondering whether setting it to 0 wouldn't
> be better.
>
> Does the CPU _ever_ look look at the PFN if the page is !_PAGE_PRESENT,
> for example speculatively? If yes then what is the recommended value for
> the pfn - zero perhaps?
>

This is L1TF, right? Isn't all ones the recommended value?

I would love to see EFI start treating its mm just like any other mm
at some point, though. That is, it should not modify any mappings in
the kernel range, and it could use the regular mm modification APIs
for the user range. But maybe this is a pipe dream.

2018-10-22 14:29:39

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/efi: Unmap efi boot services code/data regions from efi_pgd

On 10/21/2018 06:57 PM, Ingo Molnar wrote:
> Does the CPU _ever_ look look at the PFN if the page is !_PAGE_PRESENT,
> for example speculatively? If yes then what is the recommended value for
> the pfn - zero perhaps?

I'll never say never. :)

For L1TF[1], we know the CPU did exactly this; it ignored the
_PAGE_PRESENT bit when fetching data from the L1. That's what is worked
around with the gunk in arch/x86/include/asm/pgtable-invert.h.

I think Andi plugged the code in here at a low enough level in the page
table manipulation that pageattr.c should inherit it without doing
anything explicit. But, Sai, you might want to double-check this.

1.
https://www.intel.com/content/www/us/en/architecture-and-technology/l1tf.html


2018-10-22 17:54:33

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH 1/2] x86/efi: Unmap efi boot services code/data regions from efi_pgd

> > The one bit that is odd is the cpa->pfn field - for unmapped pages
> > that's totally uninteresting and I'm wondering whether setting it to 0
> > wouldn't be better.
> >
> > Does the CPU _ever_ look look at the PFN if the page is
> > !_PAGE_PRESENT, for example speculatively? If yes then what is the
> > recommended value for the pfn - zero perhaps?
> >
>
> This is L1TF, right? Isn't all ones the recommended value?
>
> I would love to see EFI start treating its mm just like any other mm at some
> point, though. That is, it should not modify any mappings in the kernel range,
> and it could use the regular mm modification APIs for the user range. But maybe
> this is a pipe dream.

Sounds good to me. I have some more fixes for EFI code and as soon as they are done,
I will start looking into this.

Regards,
Sai

2018-10-22 17:56:38

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH 1/2] x86/efi: Unmap efi boot services code/data regions from efi_pgd

> On 10/21/2018 06:57 PM, Ingo Molnar wrote:
> > Does the CPU _ever_ look look at the PFN if the page is
> > !_PAGE_PRESENT, for example speculatively? If yes then what is the
> > recommended value for the pfn - zero perhaps?
>
> I'll never say never. :)
>
> For L1TF[1], we know the CPU did exactly this; it ignored the _PAGE_PRESENT
> bit when fetching data from the L1. That's what is worked around with the gunk
> in arch/x86/include/asm/pgtable-invert.h.
>
> I think Andi plugged the code in here at a low enough level in the page table
> manipulation that pageattr.c should inherit it without doing anything explicit.
> But, Sai, you might want to double-check this.
>
> 1.
> https://www.intel.com/content/www/us/en/architecture-and-
> technology/l1tf.html

Sure! Dave. Thanks for the link. I will take a look at it.

Regards,
Sai