2011-06-14 15:03:40

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH] x86, efi: Do not reserve boot services regions within reserved areas

Commit 916f676f8dc started reserving boot service code since some systems
require you to keep that code around until SetVirtualAddressMap is called.

However, in some cases those areas will overlap with reserved regions.
The proper medium-term fix is to fix the bootloader to prevent the
conflicts from occuring by moving the kernel to a better position,
but the kernel should check for this possibility, and only reserve regions
which can be reserved.

Signed-off-by: Maarten Lankhorst <[email protected]>
---

diff --git a/arch/x86/include/asm/memblock.h b/arch/x86/include/asm/memblock.h
index 19ae14b..0cd3800 100644
--- a/arch/x86/include/asm/memblock.h
+++ b/arch/x86/include/asm/memblock.h
@@ -4,7 +4,6 @@
#define ARCH_DISCARD_MEMBLOCK

u64 memblock_x86_find_in_range_size(u64 start, u64 *sizep, u64 align);
-void memblock_x86_to_bootmem(u64 start, u64 end);

void memblock_x86_reserve_range(u64 start, u64 end, char *name);
void memblock_x86_free_range(u64 start, u64 end);
@@ -19,5 +18,6 @@ u64 memblock_x86_hole_size(u64 start, u64 end);
u64 memblock_x86_find_in_range_node(int nid, u64 start, u64 end, u64 size, u64 align);
u64 memblock_x86_free_memory_in_range(u64 addr, u64 limit);
u64 memblock_x86_memory_in_range(u64 addr, u64 limit);
+bool memblock_x86_check_reserved_size(u64 *addrp, u64 *sizep, u64 align);

#endif
diff --git a/arch/x86/mm/memblock.c b/arch/x86/mm/memblock.c
index aa11693..992da5e 100644
--- a/arch/x86/mm/memblock.c
+++ b/arch/x86/mm/memblock.c
@@ -8,7 +8,7 @@
#include <linux/range.h>

/* Check for already reserved areas */
-static bool __init check_with_memblock_reserved_size(u64 *addrp, u64 *sizep, u64 align)
+bool __init memblock_x86_check_reserved_size(u64 *addrp, u64 *sizep, u64 align)
{
struct memblock_region *r;
u64 addr = *addrp, last;
@@ -59,7 +59,7 @@ u64 __init memblock_x86_find_in_range_size(u64 start, u64 *sizep, u64 align)
if (addr >= ei_last)
continue;
*sizep = ei_last - addr;
- while (check_with_memblock_reserved_size(&addr, sizep, align))
+ while (memblock_x86_check_reserved_size(&addr, sizep, align))
;

if (*sizep)
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 0d3a4fa..fae8448 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -310,14 +310,32 @@ void __init efi_reserve_boot_services(void)

for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
efi_memory_desc_t *md = p;
- unsigned long long start = md->phys_addr;
- unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
+ u64 start = md->phys_addr;
+ u64 size = md->num_pages << EFI_PAGE_SHIFT;

if (md->type != EFI_BOOT_SERVICES_CODE &&
md->type != EFI_BOOT_SERVICES_DATA)
continue;
-
- memblock_x86_reserve_range(start, start + size, "EFI Boot");
+ /* Only reserve where possible:
+ * - Not within any already allocated areas
+ * - Not over any memory area (really needed, if above?)
+ * - Not within any part of the kernel
+ * - Not the bios reserved area (0x0-0x10000, depending on config
+ */
+ if ((start+size >= virt_to_phys(_text)
+ && start <= virt_to_phys(_end)) ||
+ !e820_all_mapped(start, start+size, E820_RAM) ||
+ memblock_x86_check_reserved_size(&start, &size,
+ 1<<EFI_PAGE_SHIFT)) {
+ /* Could not reserve, skip it */
+ md->num_pages = 0;
+ printk(KERN_INFO PFX "Could not reserve boot area "
+ "[0x%llx-0x%llx)\n", start, start+size);
+ } else {
+ memblock_x86_reserve_range(start, start+size, "EFI Boot");
+ printk(KERN_INFO PFX "Reserved boot area "
+ "[0x%llx-0x%llx)\n", start, start+size);
+ }
}
}

@@ -334,6 +352,12 @@ static void __init efi_free_boot_services(void)
md->type != EFI_BOOT_SERVICES_DATA)
continue;

+ /* Could not reserve boot area */
+ if (!size)
+ continue;
+
+ printk(KERN_INFO PFX "Freeing boot area "
+ "[0x%llx-0x%llx)\n", start, start+size);
free_bootmem_late(start, size);
}
}
@@ -473,7 +497,9 @@ void __init efi_init(void)

if (memmap.desc_size != sizeof(efi_memory_desc_t))
printk(KERN_WARNING
- "Kernel-defined memdesc doesn't match the one from EFI!\n");
+ "Kernel-defined memdesc (%lu) "
+ "doesn't match the one from EFI(%lu)!\n",
+ sizeof(efi_memory_desc_t), memmap.desc_size);

if (add_efi_memmap)
do_add_efi_memmap();


2011-06-14 15:21:16

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH] x86, efi: Do not reserve boot services regions within reserved areas

On Tue, Jun 14, 2011 at 05:03:14PM +0200, Maarten Lankhorst wrote:

> + /* Could not reserve boot area */
> + if (!size)
> + continue;
> +
> + printk(KERN_INFO PFX "Freeing boot area "
> + "[0x%llx-0x%llx)\n", start, start+size);

Probably don't need the printk - we could be dumping a few hundred of
those.

> - "Kernel-defined memdesc doesn't match the one from EFI!\n");
> + "Kernel-defined memdesc (%lu) "
> + "doesn't match the one from EFI(%lu)!\n",
> + sizeof(efi_memory_desc_t), memmap.desc_size);

And drop this hunk. It's entirely valid for an implementation to do
this, we should just delete the message entirely.

--
Matthew Garrett | [email protected]

2011-06-14 15:32:36

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [PATCH] x86, efi: Do not reserve boot services regions within reserved areas

Op 14-06-11 17:21, Matthew Garrett schreef:
> On Tue, Jun 14, 2011 at 05:03:14PM +0200, Maarten Lankhorst wrote:
>
>> + /* Could not reserve boot area */
>> + if (!size)
>> + continue;
>> +
>> + printk(KERN_INFO PFX "Freeing boot area "
>> + "[0x%llx-0x%llx)\n", start, start+size);
> Probably don't need the printk - we could be dumping a few hundred of
> those.
Ok, I believe you are right about the succesful ones, since memblock=debug
will let you see those anyhow.

Should I change the unsuccesful ones to use memblock_dbg instead of printk?
In that case you can still debug it if needed. Same question for free path,
change printk to memblock_dbg?
>> - "Kernel-defined memdesc doesn't match the one from EFI!\n");
>> + "Kernel-defined memdesc (%lu) "
>> + "doesn't match the one from EFI(%lu)!\n",
>> + sizeof(efi_memory_desc_t), memmap.desc_size);
> And drop this hunk. It's entirely valid for an implementation to do
> this, we should just delete the message entirely.
Ok. Just wasn't sure what to make of it, since that message kept popping up here.

~Maarten

2011-06-14 15:53:54

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH] x86, efi: Do not reserve boot services regions within reserved areas

On Tue, Jun 14, 2011 at 05:31:23PM +0200, Maarten Lankhorst wrote:
> Op 14-06-11 17:21, Matthew Garrett schreef:
> > On Tue, Jun 14, 2011 at 05:03:14PM +0200, Maarten Lankhorst wrote:
> >
> >> + /* Could not reserve boot area */
> >> + if (!size)
> >> + continue;
> >> +
> >> + printk(KERN_INFO PFX "Freeing boot area "
> >> + "[0x%llx-0x%llx)\n", start, start+size);
> > Probably don't need the printk - we could be dumping a few hundred of
> > those.
> Ok, I believe you are right about the succesful ones, since memblock=debug
> will let you see those anyhow.
>
> Should I change the unsuccesful ones to use memblock_dbg instead of printk?
> In that case you can still debug it if needed. Same question for free path,
> change printk to memblock_dbg?

I'd prefer just not printing anything here. We're engaging in a bug
workaround without any guarantee that the bug affects a machine, so
increasing the message load may result in people thinking something's
wrong even when everything's fine. Limiting it to memblock_dbg seems
sane.

--
Matthew Garrett | [email protected]