CC'ing x86 folks because this patch touches x86/mm which I am no expert of.
[Copied from Patch 1]
Ideally, after kernel assumes control of the platform, firmware shouldn't access
EFI boot services code/data regions. But, it's noticed that this is not so true
in many x86 platforms. Hence, during boot, kernel reserves EFI boot services
code/data regions [1] and maps [2] them to efi_pgd so that call to
set_virtual_address_map() doesn't fail. After returning from
set_virtual_address_map(), kernel frees the reserved regions [3] but they still
remain mapped.
This means that any code that's running in efi_pgd address space (e.g: any EFI
runtime service) would still be able to access EFI boot services code/data
regions but the contents of these regions would have long been over written by
someone else as they are freed by efi_free_boot_services(). So, it's important
to unmap these regions. After unmapping EFI boot services code/data regions, any
illegal access by buggy firmware to these regions would result in page fault
which will be handled by efi specific fault handler.
Unmapping EFI boot services code/data regions will result in clearing
PAGE_PRESENT bit and it shouldn't bother L1TF cases because it's already handled
by protnone_mask() at arch/x86/include/asm/pgtable-invert.h.
[1] Please see efi_reserve_boot_services()
[2] Please see efi_map_region() -> __map_region()
[3] Please see efi_free_boot_services()
Testing the patch set:
----------------------
1. Download buggy firmware (which accesses boot regions even after kernel has
booted) from here [1].
2. Without the patch set, you shouldn't see any kernel warning/error messages
(i.e. kernel allows accesses to EFI boot services code/data regions even after
call to set_virtual_address_map()).
3. With the patch set, you should see a kernel warning about buggy firmware,
efi_rts_wq beeing freezed and disabling runtime services forever.
Please note that this patch will change kernel's existing behavior for some EFI
runtime services but I think it's OK because kernel should have never allowed
those accesses in the first place.
Also please note that this patch set needs lot of real time trashing as I just
tested it out with OVMF.
Note:
-----
Patch set based on "next" branch in efi tree.
Changes from V1 -> v2:
----------------------
1. Rewrite the cpa initializer in a more readable fashion.
2. Don't use cpa->pfn while unmapping, as it's not useful.
3. Unmap regions before freeing them up.
4. Fix spelling nits.
Sai Praneeth (2):
x86/efi: Unmap EFI boot services code/data regions from efi_pgd
x86/efi: Move efi_<reserve/free>_boot_services() to arch/x86
arch/x86/include/asm/efi.h | 2 ++
arch/x86/include/asm/pgtable_types.h | 2 ++
arch/x86/mm/pageattr.c | 26 ++++++++++++++++++++++++++
arch/x86/platform/efi/efi.c | 2 ++
arch/x86/platform/efi/quirks.c | 25 +++++++++++++++++++++++++
include/linux/efi.h | 3 ---
init/main.c | 4 ----
7 files changed, 57 insertions(+), 7 deletions(-)
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
--
2.19.1
efi_<reserve/free>_boot_services() are x86 specific quirks and as such
should be in asm/efi.h, so move them from linux/efi.h. Also, call
efi_free_boot_services() from __efi_enter_virtual_mode() as it is x86
specific call and ideally shouldn't be part of init/main.c
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
---
arch/x86/include/asm/efi.h | 2 ++
arch/x86/platform/efi/efi.c | 2 ++
include/linux/efi.h | 3 ---
init/main.c | 4 ----
4 files changed, 4 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index eea40d52ca78..d1e64ac80b9c 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -141,6 +141,8 @@ extern int __init efi_reuse_config(u64 tables, int nr_tables);
extern void efi_delete_dummy_variable(void);
extern void efi_switch_mm(struct mm_struct *mm);
extern void efi_recover_from_page_fault(unsigned long phys_addr);
+extern void efi_free_boot_services(void);
+extern void efi_reserve_boot_services(void);
struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..93924a353e3b 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -994,6 +994,8 @@ static void __init __efi_enter_virtual_mode(void)
panic("EFI call to SetVirtualAddressMap() failed!");
}
+ efi_free_boot_services();
+
/*
* Now that EFI is in virtual mode, update the function
* pointers in the runtime service table to the new virtual addresses.
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 845174e113ce..ed2058073385 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1000,13 +1000,11 @@ extern void efi_memmap_walk (efi_freemem_callback_t callback, void *arg);
extern void efi_gettimeofday (struct timespec64 *ts);
extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, if possible */
#ifdef CONFIG_X86
-extern void efi_free_boot_services(void);
extern efi_status_t efi_query_variable_store(u32 attributes,
unsigned long size,
bool nonblocking);
extern void efi_find_mirror(void);
#else
-static inline void efi_free_boot_services(void) {}
static inline efi_status_t efi_query_variable_store(u32 attributes,
unsigned long size,
@@ -1046,7 +1044,6 @@ extern void efi_mem_reserve(phys_addr_t addr, u64 size);
extern int efi_mem_reserve_persistent(phys_addr_t addr, u64 size);
extern void efi_initialize_iomem_resources(struct resource *code_resource,
struct resource *data_resource, struct resource *bss_resource);
-extern void efi_reserve_boot_services(void);
extern int efi_get_fdt_params(struct efi_fdt_params *params);
extern struct kobject *efi_kobj;
diff --git a/init/main.c b/init/main.c
index 18f8f0140fa0..174fb14196cc 100644
--- a/init/main.c
+++ b/init/main.c
@@ -731,10 +731,6 @@ asmlinkage __visible void __init start_kernel(void)
arch_post_acpi_subsys_init();
sfi_init_late();
- if (efi_enabled(EFI_RUNTIME_SERVICES)) {
- efi_free_boot_services();
- }
-
/* Do the rest non-__init'ed, we're now alive */
rest_init();
}
--
2.19.1
Ideally, after kernel assumes control of the platform, firmware shouldn't
access EFI boot services code/data regions. But, it's noticed that this is
not so true in many x86 platforms. Hence, during boot, kernel reserves EFI
boot services code/data regions [1] and maps [2] them to efi_pgd so that
call to set_virtual_address_map() doesn't fail. After returning from
set_virtual_address_map(), kernel frees the reserved regions [3] but they
still remain mapped.
This means that any code that's running in efi_pgd address space (e.g: any
EFI runtime service) would still be able to access EFI boot services
code/data regions but the contents of these regions would have long been
over written by someone else as they are freed by efi_free_boot_services().
So, it's important to unmap these regions. After unmapping EFI boot
services code/data regions, any illegal access by buggy firmware to these
regions would result in page fault which will be handled by efi specific
fault handler.
Unmapping EFI boot services code/data regions will result in clearing
PAGE_PRESENT bit and it shouldn't bother L1TF cases because it's already
handled by protnone_mask() at arch/x86/include/asm/pgtable-invert.h.
[1] Please see efi_reserve_boot_services()
[2] Please see efi_map_region() -> __map_region()
[3] Please see efi_free_boot_services()
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Bhupesh Sharma <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
---
arch/x86/include/asm/pgtable_types.h | 2 ++
arch/x86/mm/pageattr.c | 26 ++++++++++++++++++++++++++
arch/x86/platform/efi/quirks.c | 25 +++++++++++++++++++++++++
3 files changed, 53 insertions(+)
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index b64acb08a62b..cda04ecf5432 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -566,6 +566,8 @@ extern pmd_t *lookup_pmd_address(unsigned long address);
extern phys_addr_t slow_virt_to_phys(void *__address);
extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
unsigned numpages, unsigned long page_flags);
+extern int kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
+ unsigned long numpages);
#endif /* !__ASSEMBLY__ */
#endif /* _ASM_X86_PGTABLE_DEFS_H */
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 51a5a69ecac9..248f16181bed 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -2147,6 +2147,32 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
return retval;
}
+int kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
+ unsigned long numpages)
+{
+ int retval;
+
+ /*
+ * The typical sequence for unmapping is to find a pte through
+ * lookup_address_in_pgd() (ideally, it should never return NULL because
+ * the address is already mapped) and change it's protections.
+ * As pfn is the *target* of a mapping, it's not useful while unmapping.
+ */
+ struct cpa_data cpa = {
+ .vaddr = &address,
+ .pgd = pgd,
+ .numpages = numpages,
+ .mask_set = __pgprot(0),
+ .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW),
+ .flags = 0,
+ };
+
+ retval = __change_page_attr_set_clr(&cpa, 0);
+ __flush_tlb_all();
+
+ return retval;
+}
+
/*
* The testcases use internal knowledge of the implementation that shouldn't
* be exposed to the rest of the kernel. Include these directly here.
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 669babcaf245..fb1c44b11235 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -370,6 +370,24 @@ void __init efi_reserve_boot_services(void)
}
}
+/*
+ * Apart from having VA mappings for EFI boot services code/data regions,
+ * (duplicate) 1:1 mappings were also created as a quirk for buggy firmware. So,
+ * unmap both 1:1 and VA mappings.
+ */
+static void __init efi_unmap_pages(efi_memory_desc_t *md)
+{
+ pgd_t *pgd = efi_mm.pgd;
+ u64 pa = md->phys_addr;
+ u64 va = md->virt_addr;
+
+ if (kernel_unmap_pages_in_pgd(pgd, pa, md->num_pages))
+ pr_err("Failed to unmap 1:1 mapping for 0x%llx\n", pa);
+
+ if (kernel_unmap_pages_in_pgd(pgd, va, md->num_pages))
+ pr_err("Failed to unmap VA mapping for 0x%llx\n", va);
+}
+
void __init efi_free_boot_services(void)
{
phys_addr_t new_phys, new_size;
@@ -394,6 +412,13 @@ void __init efi_free_boot_services(void)
continue;
}
+ /*
+ * Before calling set_virtual_address_map(), EFI boot services
+ * code/data regions were mapped as a quirk for buggy firmware.
+ * Unmap them from efi_pgd before freeing them up.
+ */
+ efi_unmap_pages(md);
+
/*
* Nasty quirk: if all sub-1MB memory is used for boot
* services, we can get here without having allocated the
--
2.19.1
* Sai Praneeth Prakhya <[email protected]> wrote:
> Ideally, after kernel assumes control of the platform, firmware shouldn't
> access EFI boot services code/data regions. But, it's noticed that this is
> not so true in many x86 platforms. Hence, during boot, kernel reserves EFI
> boot services code/data regions [1] and maps [2] them to efi_pgd so that
> call to set_virtual_address_map() doesn't fail. After returning from
> set_virtual_address_map(), kernel frees the reserved regions [3] but they
> still remain mapped.
>
> This means that any code that's running in efi_pgd address space (e.g: any
> EFI runtime service) would still be able to access EFI boot services
> code/data regions but the contents of these regions would have long been
> over written by someone else as they are freed by efi_free_boot_services().
> So, it's important to unmap these regions. After unmapping EFI boot
> services code/data regions, any illegal access by buggy firmware to these
> regions would result in page fault which will be handled by efi specific
> fault handler.
>
> Unmapping EFI boot services code/data regions will result in clearing
> PAGE_PRESENT bit and it shouldn't bother L1TF cases because it's already
> handled by protnone_mask() at arch/x86/include/asm/pgtable-invert.h.
>
> [1] Please see efi_reserve_boot_services()
> [2] Please see efi_map_region() -> __map_region()
> [3] Please see efi_free_boot_services()
>
> Signed-off-by: Sai Praneeth Prakhya <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Bhupesh Sharma <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> ---
> arch/x86/include/asm/pgtable_types.h | 2 ++
> arch/x86/mm/pageattr.c | 26 ++++++++++++++++++++++++++
> arch/x86/platform/efi/quirks.c | 25 +++++++++++++++++++++++++
> 3 files changed, 53 insertions(+)
>
> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> index b64acb08a62b..cda04ecf5432 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -566,6 +566,8 @@ extern pmd_t *lookup_pmd_address(unsigned long address);
> extern phys_addr_t slow_virt_to_phys(void *__address);
> extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
> unsigned numpages, unsigned long page_flags);
> +extern int kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
> + unsigned long numpages);
> #endif /* !__ASSEMBLY__ */
>
> #endif /* _ASM_X86_PGTABLE_DEFS_H */
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index 51a5a69ecac9..248f16181bed 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -2147,6 +2147,32 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
> return retval;
> }
>
> +int kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
> + unsigned long numpages)
> +{
> + int retval;
> +
> + /*
> + * The typical sequence for unmapping is to find a pte through
> + * lookup_address_in_pgd() (ideally, it should never return NULL because
> + * the address is already mapped) and change it's protections.
> + * As pfn is the *target* of a mapping, it's not useful while unmapping.
> + */
> + struct cpa_data cpa = {
> + .vaddr = &address,
> + .pgd = pgd,
> + .numpages = numpages,
> + .mask_set = __pgprot(0),
> + .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW),
> + .flags = 0,
> + };
> +
> + retval = __change_page_attr_set_clr(&cpa, 0);
> + __flush_tlb_all();
So, just to clarify, 'pfn' is kept at 0 here? Might make sense to write
it out explicitly like 'flags', even if it's not used by
__change_page_attr_set_clr().
> +
> + return retval;
> +}
> +
> /*
> * The testcases use internal knowledge of the implementation that shouldn't
> * be exposed to the rest of the kernel. Include these directly here.
> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
> index 669babcaf245..fb1c44b11235 100644
> --- a/arch/x86/platform/efi/quirks.c
> +++ b/arch/x86/platform/efi/quirks.c
> @@ -370,6 +370,24 @@ void __init efi_reserve_boot_services(void)
> }
> }
>
> +/*
> + * Apart from having VA mappings for EFI boot services code/data regions,
> + * (duplicate) 1:1 mappings were also created as a quirk for buggy firmware. So,
> + * unmap both 1:1 and VA mappings.
> + */
> +static void __init efi_unmap_pages(efi_memory_desc_t *md)
> +{
> + pgd_t *pgd = efi_mm.pgd;
> + u64 pa = md->phys_addr;
> + u64 va = md->virt_addr;
> +
> + if (kernel_unmap_pages_in_pgd(pgd, pa, md->num_pages))
> + pr_err("Failed to unmap 1:1 mapping for 0x%llx\n", pa);
> +
> + if (kernel_unmap_pages_in_pgd(pgd, va, md->num_pages))
> + pr_err("Failed to unmap VA mapping for 0x%llx\n", va);
> +}
> +
> void __init efi_free_boot_services(void)
> {
> phys_addr_t new_phys, new_size;
> @@ -394,6 +412,13 @@ void __init efi_free_boot_services(void)
> continue;
> }
>
> + /*
> + * Before calling set_virtual_address_map(), EFI boot services
> + * code/data regions were mapped as a quirk for buggy firmware.
> + * Unmap them from efi_pgd before freeing them up.
> + */
> + efi_unmap_pages(md);
> +
> /*
> * Nasty quirk: if all sub-1MB memory is used for boot
> * services, we can get here without having allocated the
Assuming nothing breaks and Ard likes it too:
Acked-by: Ingo Molnar <[email protected]>
Thanks,
Ingo
> > +int kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
> > + unsigned long numpages)
> > +{
> > + int retval;
> > +
> > + /*
> > + * The typical sequence for unmapping is to find a pte through
> > + * lookup_address_in_pgd() (ideally, it should never return NULL
> because
> > + * the address is already mapped) and change it's protections.
> > + * As pfn is the *target* of a mapping, it's not useful while unmapping.
> > + */
> > + struct cpa_data cpa = {
> > + .vaddr = &address,
> > + .pgd = pgd,
> > + .numpages = numpages,
> > + .mask_set = __pgprot(0),
> > + .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW),
> > + .flags = 0,
> > + };
> > +
> > + retval = __change_page_attr_set_clr(&cpa, 0);
> > + __flush_tlb_all();
>
> So, just to clarify, 'pfn' is kept at 0 here? Might make sense to write it out
> explicitly like 'flags', even if it's not used by __change_page_attr_set_clr().
Sure! Makes sense. I will add it.
Regards,
Sai
On Fri, Oct 26, 2018 at 02:38:44PM -0700, Sai Praneeth Prakhya wrote:
> +int kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
> + unsigned long numpages)
> +{
> + int retval;
> +
> + /*
> + * The typical sequence for unmapping is to find a pte through
> + * lookup_address_in_pgd() (ideally, it should never return NULL because
> + * the address is already mapped) and change it's protections.
> + * As pfn is the *target* of a mapping, it's not useful while unmapping.
> + */
> + struct cpa_data cpa = {
> + .vaddr = &address,
> + .pgd = pgd,
> + .numpages = numpages,
> + .mask_set = __pgprot(0),
> + .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW),
> + .flags = 0,
> + };
> +
> + retval = __change_page_attr_set_clr(&cpa, 0);
> + __flush_tlb_all();
How is that not a TLB invalidation bug ?
> +
> + return retval;
> +}
Sai,
On Fri, 26 Oct 2018, Sai Praneeth Prakhya wrote:
>
> +int kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
> + unsigned long numpages)
> +{
> + int retval;
> +
> + /*
> + * The typical sequence for unmapping is to find a pte through
> + * lookup_address_in_pgd() (ideally, it should never return NULL because
> + * the address is already mapped) and change it's protections.
> + * As pfn is the *target* of a mapping, it's not useful while unmapping.
> + */
> + struct cpa_data cpa = {
> + .vaddr = &address,
> + .pgd = pgd,
> + .numpages = numpages,
> + .mask_set = __pgprot(0),
> + .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW),
> + .flags = 0,
> + };
> +
> + retval = __change_page_attr_set_clr(&cpa, 0);
> + __flush_tlb_all();
So this looks like you copied it from kernel_map_pages_in_pgd() which has
been discussed before to be not sufficient, but it can't be changed right
now due to locking issues.
For this particular use case, this should not be an issue at all, so please
use the proper cpa_flush_*() variant.
> +
> + return retval;
> +}
> +
> /*
> * The testcases use internal knowledge of the implementation that shouldn't
> * be exposed to the rest of the kernel. Include these directly here.
> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
> index 669babcaf245..fb1c44b11235 100644
> --- a/arch/x86/platform/efi/quirks.c
> +++ b/arch/x86/platform/efi/quirks.c
While you are at it, can you please split the EFI part out into a separate
patch?
Thanks,
tglx
On Fri, 26 Oct 2018, Sai Praneeth Prakhya wrote:
> efi_<reserve/free>_boot_services() are x86 specific quirks and as such
> should be in asm/efi.h, so move them from linux/efi.h. Also, call
> efi_free_boot_services() from __efi_enter_virtual_mode() as it is x86
> specific call and ideally shouldn't be part of init/main.c
>
> Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Sai,
On Mon, 29 Oct 2018, Thomas Gleixner wrote:
> On Fri, 26 Oct 2018, Sai Praneeth Prakhya wrote:
> >
> > +int kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
> > + unsigned long numpages)
> > +{
> > + int retval;
> > +
> > + /*
> > + * The typical sequence for unmapping is to find a pte through
> > + * lookup_address_in_pgd() (ideally, it should never return NULL because
> > + * the address is already mapped) and change it's protections.
> > + * As pfn is the *target* of a mapping, it's not useful while unmapping.
> > + */
> > + struct cpa_data cpa = {
> > + .vaddr = &address,
> > + .pgd = pgd,
> > + .numpages = numpages,
> > + .mask_set = __pgprot(0),
> > + .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW),
> > + .flags = 0,
> > + };
> > +
> > + retval = __change_page_attr_set_clr(&cpa, 0);
> > + __flush_tlb_all();
>
> So this looks like you copied it from kernel_map_pages_in_pgd() which has
> been discussed before to be not sufficient, but it can't be changed right
> now due to locking issues.
Managed to confuse myself. The place which cannot be changed is a different
one, but still for your call site __flush_tlb_all() might not be sufficient.
Thanks,
tglx
Hi Thomas and Peter,
[off the mailing list because I wasn't sure if it's a good idea to spam others with my questions]
> > > + /*
> > > + * The typical sequence for unmapping is to find a pte through
> > > + * lookup_address_in_pgd() (ideally, it should never return NULL
> because
> > > + * the address is already mapped) and change it's protections.
> > > + * As pfn is the *target* of a mapping, it's not useful while unmapping.
> > > + */
> > > + struct cpa_data cpa = {
> > > + .vaddr = &address,
> > > + .pgd = pgd,
> > > + .numpages = numpages,
> > > + .mask_set = __pgprot(0),
> > > + .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW),
> > > + .flags = 0,
> > > + };
> > > +
> > > + retval = __change_page_attr_set_clr(&cpa, 0);
> > > + __flush_tlb_all();
> >
> > So this looks like you copied it from kernel_map_pages_in_pgd() which
> > has been discussed before to be not sufficient, but it can't be
> > changed right now due to locking issues.
>
> Managed to confuse myself. The place which cannot be changed is a different
> one, but still for your call site __flush_tlb_all() might not be sufficient.
Since the mappings are changed, I thought a flush_tlb() might be needed.
But, (to be honest), I don't completely understand the x86/mm code. So, could you
please elaborate the issue more for my better understanding?
So some questions I have are,
1. What did Peter Z mean here? "How is that not a TLB invalidation bug ?"
2. I assumed kernel_map_pages_in_pgd() to be a good code but you said that
it has some locking issues. So, could you please elaborate more on that.
Or, could you please provide me some pointers in the source code that I can take a look at so that I could understand the issue much better.
Regards,
Sai
Sai,
On Mon, 29 Oct 2018, Prakhya, Sai Praneeth wrote:
> Hi Thomas and Peter,
>
> [off the mailing list because I wasn't sure if it's a good idea to spam others with my questions]
In general it's good to do on list because others can learn from the
answers as well.
> > > > + retval = __change_page_attr_set_clr(&cpa, 0);
> > > > + __flush_tlb_all();
> > >
> > > So this looks like you copied it from kernel_map_pages_in_pgd() which
> > > has been discussed before to be not sufficient, but it can't be
> > > changed right now due to locking issues.
> >
> > Managed to confuse myself. The place which cannot be changed is a different
> > one, but still for your call site __flush_tlb_all() might not be sufficient.
>
> Since the mappings are changed, I thought a flush_tlb() might be needed.
> But, (to be honest), I don't completely understand the x86/mm code. So, could you
> please elaborate the issue more for my better understanding?
>
> So some questions I have are,
> 1. What did Peter Z mean here? "How is that not a TLB invalidation bug ?"
__flush_tlb_all() flushes only the mapping on the current CPU. So in a SMP
environment it's not sufficient.
> 2. I assumed kernel_map_pages_in_pgd() to be a good code but you said that
> it has some locking issues. So, could you please elaborate more on that.
I corrected myself. It's the other function which cannot use the proper
cpa_flush.*() functions.
> Or, could you please provide me some pointers in the source code that I
> can take a look at so that I could understand the issue much better.
I'll have a second look at the whole thing and reply on list.
Thanks,
tglx
Sai,
On Mon, 29 Oct 2018, Thomas Gleixner wrote:
> On Mon, 29 Oct 2018, Prakhya, Sai Praneeth wrote:
>
> > Hi Thomas and Peter,
> >
> > [off the mailing list because I wasn't sure if it's a good idea to spam others with my questions]
>
> In general it's good to do on list because others can learn from the
> answers as well.
So you somehow managed to keep everyone and the lists in CC
nevertheless. Did not pay attention to the CC list until I got my reply on
the list :)
> > > > > + retval = __change_page_attr_set_clr(&cpa, 0);
> > > > > + __flush_tlb_all();
> > > >
> > > > So this looks like you copied it from kernel_map_pages_in_pgd() which
> > > > has been discussed before to be not sufficient, but it can't be
> > > > changed right now due to locking issues.
> > >
> > > Managed to confuse myself. The place which cannot be changed is a different
> > > one, but still for your call site __flush_tlb_all() might not be sufficient.
> >
> > Since the mappings are changed, I thought a flush_tlb() might be needed.
> > But, (to be honest), I don't completely understand the x86/mm code. So, could you
> > please elaborate the issue more for my better understanding?
> >
> > So some questions I have are,
> > 1. What did Peter Z mean here? "How is that not a TLB invalidation bug ?"
>
> __flush_tlb_all() flushes only the mapping on the current CPU. So in a SMP
> environment it's not sufficient.
>
> > 2. I assumed kernel_map_pages_in_pgd() to be a good code but you said that
> > it has some locking issues. So, could you please elaborate more on that.
>
> I corrected myself. It's the other function which cannot use the proper
> cpa_flush.*() functions.
>
> > Or, could you please provide me some pointers in the source code that I
> > can take a look at so that I could understand the issue much better.
>
> I'll have a second look at the whole thing and reply on list.
So actually for the problem at hand __flush_tlb_all() works, because that
is called way before the secondary CPUs are up.
Ditto for kernel_map_pages_in_pgd(). But both functions want to be marked
__init and aside of that they both should contain a warning when they are
called after the secondary CPUs have been brought up.
Thanks,
tglx
> > In general it's good to do on list because others can learn from the
> > answers as well.
>
> So you somehow managed to keep everyone and the lists in CC nevertheless.
> Did not pay attention to the CC list until I got my reply on the list :)
Sorry! my bad.. yes, the conversation did happen on the list and it's by accident.
>
> > > > > > + retval = __change_page_attr_set_clr(&cpa, 0);
[........]
> > > So some questions I have are,
> > > 1. What did Peter Z mean here? "How is that not a TLB invalidation bug ?"
> >
> > __flush_tlb_all() flushes only the mapping on the current CPU. So in a
> > SMP environment it's not sufficient.
Thanks for the explanation. The issue makes sense to me now.
[.........]
> > I'll have a second look at the whole thing and reply on list.
>
> So actually for the problem at hand __flush_tlb_all() works, because that is
> called way before the secondary CPUs are up.
>
> Ditto for kernel_map_pages_in_pgd().
Yes, you are right. Both the functions are *always* called before SMP initialization
and as you rightly pointed out they should be marked with __init.
> But both functions want to be marked
> __init and aside of that they both should contain a warning when they are called
> after the secondary CPUs have been brought up.
Makes sense too.. I will include these changes in V3.
Regards,
Sai
> > /*
> > * The testcases use internal knowledge of the implementation that shouldn't
> > * be exposed to the rest of the kernel. Include these directly here.
> > diff --git a/arch/x86/platform/efi/quirks.c
> > b/arch/x86/platform/efi/quirks.c index 669babcaf245..fb1c44b11235
> > 100644
> > --- a/arch/x86/platform/efi/quirks.c
> > +++ b/arch/x86/platform/efi/quirks.c
>
> While you are at it, can you please split the EFI part out into a separate patch?
Sure! I will do it in V3.
Regards,
Sai