2016-12-22 10:24:04

by Nicolai Stange

[permalink] [raw]
Subject: [PATCH v2 1/2] x86/efi: don't allocate memmap through memblock after mm_init()

With commit 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid
copying image data"), efi_bgrt_init() calls into the memblock allocator
through efi_mem_reserve() => efi_arch_mem_reserve() *after* mm_init()
has been called.

Indeed, KASAN reports a bad read access later on in
efi_free_boot_services():

BUG: KASAN: use-after-free in efi_free_boot_services+0xae/0x24c
at addr ffff88022de12740
Read of size 4 by task swapper/0/0
page:ffffea0008b78480 count:0 mapcount:-127
mapping: (null) index:0x1 flags: 0x5fff8000000000()
[...]
Call Trace:
dump_stack+0x68/0x9f
kasan_report_error+0x4c8/0x500
kasan_report+0x58/0x60
__asan_load4+0x61/0x80
efi_free_boot_services+0xae/0x24c
start_kernel+0x527/0x562
x86_64_start_reservations+0x24/0x26
x86_64_start_kernel+0x157/0x17a
start_cpu+0x5/0x14

The instruction at the given address is the first read from the memmap's
memory, i.e. the read of md->type in efi_free_boot_services().

Note that the writes earlier in efi_arch_mem_reserve() don't splat because
they're done through early_memremap()ed addresses.

So, after memblock is gone, allocations should be done through the "normal"
page allocator. Introduce a helper, efi_memmap_alloc() for this. Use
it from efi_arch_mem_reserve() and from efi_free_boot_services() as well.

Fixes: 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
Signed-off-by: Nicolai Stange <[email protected]>
---
arch/x86/platform/efi/quirks.c | 4 ++--
drivers/firmware/efi/memmap.c | 38 ++++++++++++++++++++++++++++++++++++++
include/linux/efi.h | 1 +
3 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 10aca63a50d7..30031d5293c4 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -214,7 +214,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)

new_size = efi.memmap.desc_size * num_entries;

- new_phys = memblock_alloc(new_size, 0);
+ new_phys = efi_memmap_alloc(num_entries);
if (!new_phys) {
pr_err("Could not allocate boot services memmap\n");
return;
@@ -355,7 +355,7 @@ void __init efi_free_boot_services(void)
}

new_size = efi.memmap.desc_size * num_entries;
- new_phys = memblock_alloc(new_size, 0);
+ new_phys = efi_memmap_alloc(num_entries);
if (!new_phys) {
pr_err("Failed to allocate new EFI memmap\n");
return;
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index f03ddecd232b..78686443cb37 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -9,6 +9,44 @@
#include <linux/efi.h>
#include <linux/io.h>
#include <asm/early_ioremap.h>
+#include <linux/memblock.h>
+#include <linux/slab.h>
+
+static phys_addr_t __init __efi_memmap_alloc_early(unsigned long size)
+{
+ return memblock_alloc(size, 0);
+}
+
+static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
+{
+ unsigned int order = get_order(size);
+ struct page *p = alloc_pages(GFP_KERNEL, order);
+
+ if (!p)
+ return 0;
+
+ return PFN_PHYS(page_to_pfn(p));
+}
+
+/**
+ * efi_memmap_alloc - Allocate memory for the EFI memory map
+ * @num_entries: Number of entries in the allocated map.
+ *
+ * Depending on whether mm_init() has already been invoked or not,
+ * either memblock or "normal" page allocation is used.
+ *
+ * Returns the physical address of the allocated memory map on
+ * success, zero on failure.
+ */
+phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
+{
+ unsigned long size = num_entries * efi.memmap.desc_size;
+
+ if (slab_is_available())
+ return __efi_memmap_alloc_late(size);
+
+ return __efi_memmap_alloc_early(size);
+}

/**
* __efi_memmap_init - Common code for mapping the EFI memory map
diff --git a/include/linux/efi.h b/include/linux/efi.h
index a07a476178cd..0c5420208c40 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -950,6 +950,7 @@ static inline efi_status_t efi_query_variable_store(u32 attributes,
#endif
extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);

+extern phys_addr_t __init efi_memmap_alloc(unsigned int num_entries);
extern int __init efi_memmap_init_early(struct efi_memory_map_data *data);
extern int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size);
extern void __init efi_memmap_unmap(void);
--
2.11.0


2016-12-22 10:24:10

by Nicolai Stange

[permalink] [raw]
Subject: [PATCH v2 2/2] efi: efi_mem_reserve(): don't reserve through memblock after mm_init()

Before invoking the arch specific handler, efi_mem_reserve() reserves
the given memory region through memblock.

efi_mem_reserve() can get called after mm_init() though -- through
efi_bgrt_init(), for example. After mm_init(), memblock is dead and should
not be used anymore.

Let efi_mem_reserve() check whether memblock is dead and not do the
reservation if so. Emit a warning from the generic efi_arch mem_reserve()
in this case: if the architecture doesn't provide any other means of
registering the region as reserved, the operation would be a nop.

Fixes: 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
Signed-off-by: Nicolai Stange <[email protected]>
---
Changes to v1:
Change the if condition from slab_is_available() to !slab_is_available
as pointed out by Mika Penttilä at
http://lkml.kernel.org/r/[email protected]

drivers/firmware/efi/efi.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 92914801e388..158a8df2f4af 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -403,7 +403,10 @@ u64 __init efi_mem_desc_end(efi_memory_desc_t *md)
return end;
}

-void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size) {}
+void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size)
+{
+ WARN(slab_is_available(), "efi_mem_reserve() has no effect");
+}

/**
* efi_mem_reserve - Reserve an EFI memory region
@@ -419,7 +422,7 @@ void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size) {}
*/
void __init efi_mem_reserve(phys_addr_t addr, u64 size)
{
- if (!memblock_is_region_reserved(addr, size))
+ if (!slab_is_available() && !memblock_is_region_reserved(addr, size))
memblock_reserve(addr, size);

/*
--
2.11.0

2016-12-23 14:52:48

by Matt Fleming

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] x86/efi: don't allocate memmap through memblock after mm_init()

On Thu, 22 Dec, at 11:23:39AM, Nicolai Stange wrote:
> With commit 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid
> copying image data"), efi_bgrt_init() calls into the memblock allocator
> through efi_mem_reserve() => efi_arch_mem_reserve() *after* mm_init()
> has been called.
>
> Indeed, KASAN reports a bad read access later on in
> efi_free_boot_services():
>
> BUG: KASAN: use-after-free in efi_free_boot_services+0xae/0x24c
> at addr ffff88022de12740
> Read of size 4 by task swapper/0/0
> page:ffffea0008b78480 count:0 mapcount:-127
> mapping: (null) index:0x1 flags: 0x5fff8000000000()
> [...]
> Call Trace:
> dump_stack+0x68/0x9f
> kasan_report_error+0x4c8/0x500
> kasan_report+0x58/0x60
> __asan_load4+0x61/0x80
> efi_free_boot_services+0xae/0x24c
> start_kernel+0x527/0x562
> x86_64_start_reservations+0x24/0x26
> x86_64_start_kernel+0x157/0x17a
> start_cpu+0x5/0x14
>
> The instruction at the given address is the first read from the memmap's
> memory, i.e. the read of md->type in efi_free_boot_services().
>
> Note that the writes earlier in efi_arch_mem_reserve() don't splat because
> they're done through early_memremap()ed addresses.
>
> So, after memblock is gone, allocations should be done through the "normal"
> page allocator. Introduce a helper, efi_memmap_alloc() for this. Use
> it from efi_arch_mem_reserve() and from efi_free_boot_services() as well.
>
> Fixes: 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
> Signed-off-by: Nicolai Stange <[email protected]>
> ---
> arch/x86/platform/efi/quirks.c | 4 ++--
> drivers/firmware/efi/memmap.c | 38 ++++++++++++++++++++++++++++++++++++++
> include/linux/efi.h | 1 +
> 3 files changed, 41 insertions(+), 2 deletions(-)

Nice catch. Could you also modify efi_fake_memmap() to use your new
efi_memmap_alloc() function for consistency (note that all
memblock_alloc()s should probably be PAGE_SIZE aligned like the
fakemem code)?

2016-12-23 21:14:00

by Nicolai Stange

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] x86/efi: don't allocate memmap through memblock after mm_init()

Matt Fleming <[email protected]> writes:

> On Thu, 22 Dec, at 11:23:39AM, Nicolai Stange wrote:
>> So, after memblock is gone, allocations should be done through the "normal"
>> page allocator. Introduce a helper, efi_memmap_alloc() for this. Use
>> it from efi_arch_mem_reserve() and from efi_free_boot_services() as well.
>>
>> Fixes: 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
>> Signed-off-by: Nicolai Stange <[email protected]>

> Could you also modify efi_fake_memmap() to use your new
> efi_memmap_alloc() function for consistency

Sure.

I'm planning to submit another set of patches addressing the (bounded)
memmap leaking in anything calling efi_memmap_unmap() though. In the
course of doing so, the memmap allocation sites will get touched anyway:
I'll have to store some information about how the memmap's memory has
been obtained.

> (note that all memblock_alloc()s should probably be PAGE_SIZE aligned
> like the fakemem code)?

Ok, but I'd really like to understand why: I can't find anything in
neither the code nor in the UEFI spec requiring this. And up to now,
efi_arch_mem_reserve() as well as efi_free_boot_services() used to do
those unaligned allocations...

In light of this, is there really a necessity for using whole page
allocations after mm_init() or would kmalloc() suffice here?
Provided that the memremap bits get adjusted accordingly, of course.

So, I'm thinking of turning the ->late boolean into a tristate like the
following:

Memory allocated by | Memory mapped through
--------------------|----------------------
memblock | early_memremap
memblock | memremap
kmalloc | -

Neglecting slub overhead, the use of kmalloc() over alloc_pages() would
save 4096 - 12*40 == 3616 Bytes on my system with its 12 entries under
/sys/firmware/efi/runtime-map/. Not really critical, but if it comes for
free, why not?


Thanks,

Nicolai