2011-06-14 16:19:52

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH v2] x86, efi: Do not reserve boot services regions within reserved areas

Commit 916f676f8dc started reserving boot service code since some systems
require you to keep that code around until SetVirtualAddressMap is called.

However, in some cases those areas will overlap with reserved regions.
The proper medium-term fix is to fix the bootloader to prevent the
conflicts from occurring by moving the kernel to a better position,
but the kernel should check for this possibility, and only reserve regions
which can be reserved.

Signed-off-by: Maarten Lankhorst <[email protected]>

---

V2: Removed some printks and a unrelated change

diff --git a/arch/x86/include/asm/memblock.h b/arch/x86/include/asm/memblock.h
index 19ae14b..0cd3800 100644
--- a/arch/x86/include/asm/memblock.h
+++ b/arch/x86/include/asm/memblock.h
@@ -4,7 +4,6 @@
#define ARCH_DISCARD_MEMBLOCK

u64 memblock_x86_find_in_range_size(u64 start, u64 *sizep, u64 align);
-void memblock_x86_to_bootmem(u64 start, u64 end);

void memblock_x86_reserve_range(u64 start, u64 end, char *name);
void memblock_x86_free_range(u64 start, u64 end);
@@ -19,5 +18,6 @@ u64 memblock_x86_hole_size(u64 start, u64 end);
u64 memblock_x86_find_in_range_node(int nid, u64 start, u64 end, u64 size, u64 align);
u64 memblock_x86_free_memory_in_range(u64 addr, u64 limit);
u64 memblock_x86_memory_in_range(u64 addr, u64 limit);
+bool memblock_x86_check_reserved_size(u64 *addrp, u64 *sizep, u64 align);

#endif
diff --git a/arch/x86/mm/memblock.c b/arch/x86/mm/memblock.c
index aa11693..992da5e 100644
--- a/arch/x86/mm/memblock.c
+++ b/arch/x86/mm/memblock.c
@@ -8,7 +8,7 @@
#include <linux/range.h>

/* Check for already reserved areas */
-static bool __init check_with_memblock_reserved_size(u64 *addrp, u64 *sizep, u64 align)
+bool __init memblock_x86_check_reserved_size(u64 *addrp, u64 *sizep, u64 align)
{
struct memblock_region *r;
u64 addr = *addrp, last;
@@ -59,7 +59,7 @@ u64 __init memblock_x86_find_in_range_size(u64 start, u64 *sizep, u64 align)
if (addr >= ei_last)
continue;
*sizep = ei_last - addr;
- while (check_with_memblock_reserved_size(&addr, sizep, align))
+ while (memblock_x86_check_reserved_size(&addr, sizep, align))
;

if (*sizep)
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 0d3a4fa..b8afde4 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -310,14 +310,30 @@ void __init efi_reserve_boot_services(void)

for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
efi_memory_desc_t *md = p;
- unsigned long long start = md->phys_addr;
- unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
+ u64 start = md->phys_addr;
+ u64 size = md->num_pages << EFI_PAGE_SHIFT;

if (md->type != EFI_BOOT_SERVICES_CODE &&
md->type != EFI_BOOT_SERVICES_DATA)
continue;
-
- memblock_x86_reserve_range(start, start + size, "EFI Boot");
+ /* Only reserve where possible:
+ * - Not within any already allocated areas
+ * - Not over any memory area (really needed, if above?)
+ * - Not within any part of the kernel
+ * - Not the bios reserved area
+ */
+ if ((start+size >= virt_to_phys(_text)
+ && start <= virt_to_phys(_end)) ||
+ !e820_all_mapped(start, start+size, E820_RAM) ||
+ memblock_x86_check_reserved_size(&start, &size,
+ 1<<EFI_PAGE_SHIFT)) {
+ /* Could not reserve, skip it */
+ md->num_pages = 0;
+ memblock_dbg(PFX "Could not reserve boot area "
+ "[0x%llx-0x%llx)\n", start, start+size);
+ } else
+ memblock_x86_reserve_range(start, start+size,
+ "EFI Boot");
}
}

@@ -334,6 +350,10 @@ static void __init efi_free_boot_services(void)
md->type != EFI_BOOT_SERVICES_DATA)
continue;

+ /* Could not reserve boot area */
+ if (!size)
+ continue;
+
free_bootmem_late(start, size);
}
}


2011-06-14 17:05:21

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH v2] x86, efi: Do not reserve boot services regions within reserved areas

On Tue, 2011-06-14 at 18:19 +0200, Maarten Lankhorst wrote:
> Commit 916f676f8dc started reserving boot service code since some systems
> require you to keep that code around until SetVirtualAddressMap is called.
[]
> Signed-off-by: Maarten Lankhorst <[email protected]>

Hello Maarten, just trivia.

> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
[]
> + memblock_dbg(PFX "Could not reserve boot area "
> + "[0x%llx-0x%llx)\n", start, start+size);

I believe this should be:

memblock_dbg(PFX "Could not reserve boot area 0x%llx-0x%llx)\n",
start, start + size - 1);

2011-06-14 17:26:07

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [PATCH v2] x86, efi: Do not reserve boot services regions within reserved areas

Op 14-06-11 19:05, Joe Perches schreef:
> On Tue, 2011-06-14 at 18:19 +0200, Maarten Lankhorst wrote:
>> Commit 916f676f8dc started reserving boot service code since some systems
>> require you to keep that code around until SetVirtualAddressMap is called.
> []
>> Signed-off-by: Maarten Lankhorst <[email protected]>
> Hello Maarten, just trivia.
>
>> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> []
>> + memblock_dbg(PFX "Could not reserve boot area "
>> + "[0x%llx-0x%llx)\n", start, start+size);
> I believe this should be:
>
> memblock_dbg(PFX "Could not reserve boot area 0x%llx-0x%llx)\n",
> start, start + size - 1);
Erm, no. [x...y) means the range of x to y without including y.

Other efi code followed that convention, and the e820 code does the same, silently.

[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
...
[ 0.000000] EFI: mem00: type=3, attr=0xf, range=[0x0000000000000000-0x0000000000008000) (0MB)

~Maarten

2011-06-14 17:40:32

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH v2] x86, efi: Do not reserve boot services regions within reserved areas

On Tue, 2011-06-14 at 19:26 +0200, Maarten Lankhorst wrote:
> Op 14-06-11 19:05, Joe Perches schreef:
> > On Tue, 2011-06-14 at 18:19 +0200, Maarten Lankhorst wrote:
> >> Commit 916f676f8dc started reserving boot service code since some systems
> >> require you to keep that code around until SetVirtualAddressMap is called.
> > []
> >> Signed-off-by: Maarten Lankhorst <[email protected]>
> > Hello Maarten, just trivia.
> >
> >> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> > []
> >> + memblock_dbg(PFX "Could not reserve boot area "
> >> + "[0x%llx-0x%llx)\n", start, start+size);
> > I believe this should be:
> >
> > memblock_dbg(PFX "Could not reserve boot area 0x%llx-0x%llx)\n",
> > start, start + size - 1);
> Erm, no. [x...y) means the range of x to y without including y.
>
> Other efi code followed that convention, and the e820 code does the same, silently.
>
> [ 0.000000] BIOS-provided physical RAM map:
> [ 0.000000] BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
> ...
> [ 0.000000] EFI: mem00: type=3, attr=0xf, range=[0x0000000000000000-0x0000000000008000) (0MB)
>
> ~Maarten

The other memblock_dbg uses with range seem to disagree

$ grep -A1 -rP --include=*.[ch] "memblock_dbg.*\%.*llx" *
arch/x86/mm/memblock.c: memblock_dbg(" [%010llx-%010llx]\n", (u64)r->base, (u64)r->base + r->size - 1);
arch/x86/mm/memblock.c- final_start = PFN_DOWN(r->base);
--
arch/x86/mm/memblock.c: memblock_dbg(" memblock_x86_reserve_range: [%#010llx-%#010llx] %16s\n", start, end - 1, name);
arch/x86/mm/memblock.c-
--
arch/x86/mm/memblock.c: memblock_dbg(" memblock_x86_free_range: [%#010llx-%#010llx]\n", start, end - 1);
arch/x86/mm/memblock.c-
--
mm/memblock.c: memblock_dbg("memblock: %s array is doubled to %ld at [%#010llx-%#010llx]",
mm/memblock.c- memblock_type_name(type), type->max * 2, (u64)addr, (u64)addr + new_size - 1);


2011-06-14 18:11:26

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH v2] x86, efi: Do not reserve boot services regions within reserved areas

On 06/14/2011 09:19 AM, Maarten Lankhorst wrote:
> Commit 916f676f8dc started reserving boot service code since some systems
> require you to keep that code around until SetVirtualAddressMap is called.
>
> However, in some cases those areas will overlap with reserved regions.
> The proper medium-term fix is to fix the bootloader to prevent the
> conflicts from occurring by moving the kernel to a better position,
> but the kernel should check for this possibility, and only reserve regions
> which can be reserved.
>
> Signed-off-by: Maarten Lankhorst <[email protected]>
>
> ---
>
> V2: Removed some printks and a unrelated change
>
> diff --git a/arch/x86/include/asm/memblock.h b/arch/x86/include/asm/memblock.h
> index 19ae14b..0cd3800 100644
> --- a/arch/x86/include/asm/memblock.h
> +++ b/arch/x86/include/asm/memblock.h
> @@ -4,7 +4,6 @@
> #define ARCH_DISCARD_MEMBLOCK
>
> u64 memblock_x86_find_in_range_size(u64 start, u64 *sizep, u64 align);
> -void memblock_x86_to_bootmem(u64 start, u64 end);
>
> void memblock_x86_reserve_range(u64 start, u64 end, char *name);
> void memblock_x86_free_range(u64 start, u64 end);
> @@ -19,5 +18,6 @@ u64 memblock_x86_hole_size(u64 start, u64 end);
> u64 memblock_x86_find_in_range_node(int nid, u64 start, u64 end, u64 size, u64 align);
> u64 memblock_x86_free_memory_in_range(u64 addr, u64 limit);
> u64 memblock_x86_memory_in_range(u64 addr, u64 limit);
> +bool memblock_x86_check_reserved_size(u64 *addrp, u64 *sizep, u64 align);
>
> #endif
> diff --git a/arch/x86/mm/memblock.c b/arch/x86/mm/memblock.c
> index aa11693..992da5e 100644
> --- a/arch/x86/mm/memblock.c
> +++ b/arch/x86/mm/memblock.c
> @@ -8,7 +8,7 @@
> #include <linux/range.h>
>
> /* Check for already reserved areas */
> -static bool __init check_with_memblock_reserved_size(u64 *addrp, u64 *sizep, u64 align)
> +bool __init memblock_x86_check_reserved_size(u64 *addrp, u64 *sizep, u64 align)
> {
> struct memblock_region *r;
> u64 addr = *addrp, last;
> @@ -59,7 +59,7 @@ u64 __init memblock_x86_find_in_range_size(u64 start, u64 *sizep, u64 align)
> if (addr >= ei_last)
> continue;
> *sizep = ei_last - addr;
> - while (check_with_memblock_reserved_size(&addr, sizep, align))
> + while (memblock_x86_check_reserved_size(&addr, sizep, align))
> ;
>
> if (*sizep)
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 0d3a4fa..b8afde4 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -310,14 +310,30 @@ void __init efi_reserve_boot_services(void)
>
> for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
> efi_memory_desc_t *md = p;
> - unsigned long long start = md->phys_addr;
> - unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
> + u64 start = md->phys_addr;
> + u64 size = md->num_pages << EFI_PAGE_SHIFT;
>
> if (md->type != EFI_BOOT_SERVICES_CODE &&
> md->type != EFI_BOOT_SERVICES_DATA)
> continue;
> -
> - memblock_x86_reserve_range(start, start + size, "EFI Boot");
> + /* Only reserve where possible:
> + * - Not within any already allocated areas
> + * - Not over any memory area (really needed, if above?)
> + * - Not within any part of the kernel
> + * - Not the bios reserved area
> + */
> + if ((start+size >= virt_to_phys(_text)
> + && start <= virt_to_phys(_end)) ||
> + !e820_all_mapped(start, start+size, E820_RAM) ||
> + memblock_x86_check_reserved_size(&start, &size,
> + 1<<EFI_PAGE_SHIFT)) {
> + /* Could not reserve, skip it */
> + md->num_pages = 0;
> + memblock_dbg(PFX "Could not reserve boot area "
> + "[0x%llx-0x%llx)\n", start, start+size);

how about partial overlapping?

> + } else
> + memblock_x86_reserve_range(start, start+size,
> + "EFI Boot");
> }
> }
>
> @@ -334,6 +350,10 @@ static void __init efi_free_boot_services(void)
> md->type != EFI_BOOT_SERVICES_DATA)
> continue;
>
> + /* Could not reserve boot area */
> + if (!size)
> + continue;
> +
> free_bootmem_late(start, size);
> }
> }
>

2011-06-14 18:32:20

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [PATCH v2] x86, efi: Do not reserve boot services regions within reserved areas

Op 14-06-11 20:10, Yinghai Lu schreef:
> On 06/14/2011 09:19 AM, Maarten Lankhorst wrote:
>> Commit 916f676f8dc started reserving boot service code since some systems
>> require you to keep that code around until SetVirtualAddressMap is called.
>>
>> However, in some cases those areas will overlap with reserved regions.
>> The proper medium-term fix is to fix the bootloader to prevent the
>> conflicts from occurring by moving the kernel to a better position,
>> but the kernel should check for this possibility, and only reserve regions
>> which can be reserved.
>>
>> Signed-off-by: Maarten Lankhorst <[email protected]>
>>
>> ---
>>
>> V2: Removed some printks and a unrelated change
>>
>> diff --git a/arch/x86/include/asm/memblock.h b/arch/x86/include/asm/memblock.h
>> index 19ae14b..0cd3800 100644
>> --- a/arch/x86/include/asm/memblock.h
>> +++ b/arch/x86/include/asm/memblock.h
>> @@ -4,7 +4,6 @@
>> #define ARCH_DISCARD_MEMBLOCK
>>
>> u64 memblock_x86_find_in_range_size(u64 start, u64 *sizep, u64 align);
>> -void memblock_x86_to_bootmem(u64 start, u64 end);
>>
>> void memblock_x86_reserve_range(u64 start, u64 end, char *name);
>> void memblock_x86_free_range(u64 start, u64 end);
>> @@ -19,5 +18,6 @@ u64 memblock_x86_hole_size(u64 start, u64 end);
>> u64 memblock_x86_find_in_range_node(int nid, u64 start, u64 end, u64 size, u64 align);
>> u64 memblock_x86_free_memory_in_range(u64 addr, u64 limit);
>> u64 memblock_x86_memory_in_range(u64 addr, u64 limit);
>> +bool memblock_x86_check_reserved_size(u64 *addrp, u64 *sizep, u64 align);
>>
>> #endif
>> diff --git a/arch/x86/mm/memblock.c b/arch/x86/mm/memblock.c
>> index aa11693..992da5e 100644
>> --- a/arch/x86/mm/memblock.c
>> +++ b/arch/x86/mm/memblock.c
>> @@ -8,7 +8,7 @@
>> #include <linux/range.h>
>>
>> /* Check for already reserved areas */
>> -static bool __init check_with_memblock_reserved_size(u64 *addrp, u64 *sizep, u64 align)
>> +bool __init memblock_x86_check_reserved_size(u64 *addrp, u64 *sizep, u64 align)
>> {
>> struct memblock_region *r;
>> u64 addr = *addrp, last;
>> @@ -59,7 +59,7 @@ u64 __init memblock_x86_find_in_range_size(u64 start, u64 *sizep, u64 align)
>> if (addr >= ei_last)
>> continue;
>> *sizep = ei_last - addr;
>> - while (check_with_memblock_reserved_size(&addr, sizep, align))
>> + while (memblock_x86_check_reserved_size(&addr, sizep, align))
>> ;
>>
>> if (*sizep)
>> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
>> index 0d3a4fa..b8afde4 100644
>> --- a/arch/x86/platform/efi/efi.c
>> +++ b/arch/x86/platform/efi/efi.c
>> @@ -310,14 +310,30 @@ void __init efi_reserve_boot_services(void)
>>
>> for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
>> efi_memory_desc_t *md = p;
>> - unsigned long long start = md->phys_addr;
>> - unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
>> + u64 start = md->phys_addr;
>> + u64 size = md->num_pages << EFI_PAGE_SHIFT;
>>
>> if (md->type != EFI_BOOT_SERVICES_CODE &&
>> md->type != EFI_BOOT_SERVICES_DATA)
>> continue;
>> -
>> - memblock_x86_reserve_range(start, start + size, "EFI Boot");
>> + /* Only reserve where possible:
>> + * - Not within any already allocated areas
>> + * - Not over any memory area (really needed, if above?)
>> + * - Not within any part of the kernel
>> + * - Not the bios reserved area
>> + */
>> + if ((start+size >= virt_to_phys(_text)
>> + && start <= virt_to_phys(_end)) ||
>> + !e820_all_mapped(start, start+size, E820_RAM) ||
>> + memblock_x86_check_reserved_size(&start, &size,
>> + 1<<EFI_PAGE_SHIFT)) {
>> + /* Could not reserve, skip it */
>> + md->num_pages = 0;
>> + memblock_dbg(PFX "Could not reserve boot area "
>> + "[0x%llx-0x%llx)\n", start, start+size);
> how about partial overlapping?
If any overlap is detected I don't reserve the area. mjg is working on a
patch to prevent this from happening, but allocating partial ranges is
simply not worth the effort. Seems it's not useful either, since the
failures I get are range 0-8000 (bios reserved 0-10000), a subset of
what memblock already completely used for its configuration, and the
remainder being completely blocked by the kernel.

The more invasive fix is moving it to grub and hope everyone will
fix their bootloader immediately, but until then this is as good as it
gets. I don't think we'd get enough benefits about reserving partial
ranges, compared to how complicated the code would get.

~Maarten