2009-06-16 21:42:36

by Cliff Wickman

[permalink] [raw]
Subject: [PATCH] x86: efi/e820 table merge fix

From: Cliff Wickman <[email protected]>

This patch causes all the EFI_RESERVED_TYPE memory reservations to be recorded
in the e820 table as type E820_RESERVED.

(This patch replaces one called 'x86: vendor reserved memory type'.
This version has been discussed a bit with Peter and Yinghai but not given
a final opinion.)

Without this patch EFI_RESERVED_TYPE memory reservations may be
marked usable in the e820 table. There may be a collision between
kernel use and some reserver's use of this memory.

(An example use of this functionality is the UV system, which
will access extremely large areas of memory with a memory engine
that allows a user to address beyond the processor's range. Such
areas are reserved in the EFI table by the BIOS.
Some loaders have a restricted number of entries possible in the e820 table,
hence the need to record the reservations in the unrestricted EFI table.)

The call to do_add_efi_memmap() is only made if "add_efi_memmap" is specified
on the kernel command line.

Diffed against 2.6.30-rc8

Signed-off-by: Cliff Wickman <[email protected]>
---
arch/x86/kernel/efi.c | 31 ++++++++++++++++++++++++++++---
1 file changed, 28 insertions(+), 3 deletions(-)

Index: linux/arch/x86/kernel/efi.c
===================================================================
--- linux.orig/arch/x86/kernel/efi.c
+++ linux/arch/x86/kernel/efi.c
@@ -240,10 +240,35 @@ static void __init do_add_efi_memmap(voi
unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
int e820_type;

- if (md->attribute & EFI_MEMORY_WB)
- e820_type = E820_RAM;
- else
+ switch (md->type) {
+ case EFI_LOADER_CODE:
+ case EFI_LOADER_DATA:
+ case EFI_BOOT_SERVICES_CODE:
+ case EFI_BOOT_SERVICES_DATA:
+ case EFI_CONVENTIONAL_MEMORY:
+ if (md->attribute & EFI_MEMORY_WB)
+ e820_type = E820_RAM;
+ else
+ e820_type = E820_RESERVED;
+ break;
+ case EFI_ACPI_RECLAIM_MEMORY:
+ e820_type = E820_ACPI;
+ break;
+ case EFI_ACPI_MEMORY_NVS:
+ e820_type = E820_NVS;
+ break;
+ case EFI_UNUSABLE_MEMORY:
+ e820_type = E820_UNUSABLE;
+ break;
+ default:
+ /*
+ * EFI_RESERVED_TYPE EFI_RUNTIME_SERVICES_CODE
+ * EFI_RUNTIME_SERVICES_DATA EFI_MEMORY_MAPPED_IO
+ * EFI_MEMORY_MAPPED_IO_PORT_SPACE EFI_PAL_CODE
+ */
e820_type = E820_RESERVED;
+ break;
+ }
e820_add_region(start, size, e820_type);
}
sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);


2009-06-17 01:10:24

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH] x86: efi/e820 table merge fix

On Wed, 2009-06-17 at 05:43 +0800, Cliff Wickman wrote:
> From: Cliff Wickman <[email protected]>
> --- linux.orig/arch/x86/kernel/efi.c
> +++ linux/arch/x86/kernel/efi.c
> @@ -240,10 +240,35 @@ static void __init do_add_efi_memmap(voi
> unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
> int e820_type;
>
> - if (md->attribute & EFI_MEMORY_WB)
> - e820_type = E820_RAM;
> - else
> + switch (md->type) {
> + case EFI_LOADER_CODE:
> + case EFI_LOADER_DATA:
> + case EFI_BOOT_SERVICES_CODE:
> + case EFI_BOOT_SERVICES_DATA:
> + case EFI_CONVENTIONAL_MEMORY:
> + if (md->attribute & EFI_MEMORY_WB)
> + e820_type = E820_RAM;
> + else
> + e820_type = E820_RESERVED;
> + break;

Why does BIOS mark memory region without EFI_MEMORY_WB as these types?
Any example?

Best Regards,
Huang Ying

2009-06-17 01:39:52

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86: efi/e820 table merge fix

Huang Ying wrote:
> On Wed, 2009-06-17 at 05:43 +0800, Cliff Wickman wrote:
>> From: Cliff Wickman <[email protected]>
>> --- linux.orig/arch/x86/kernel/efi.c
>> +++ linux/arch/x86/kernel/efi.c
>> @@ -240,10 +240,35 @@ static void __init do_add_efi_memmap(voi
>> unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
>> int e820_type;
>>
>> - if (md->attribute & EFI_MEMORY_WB)
>> - e820_type = E820_RAM;
>> - else
>> + switch (md->type) {
>> + case EFI_LOADER_CODE:
>> + case EFI_LOADER_DATA:
>> + case EFI_BOOT_SERVICES_CODE:
>> + case EFI_BOOT_SERVICES_DATA:
>> + case EFI_CONVENTIONAL_MEMORY:
>> + if (md->attribute & EFI_MEMORY_WB)
>> + e820_type = E820_RAM;
>> + else
>> + e820_type = E820_RESERVED;
>> + break;
>
> Why does BIOS mark memory region without EFI_MEMORY_WB as these types?
> Any example?
>

Probably not, but if it does, it's broken, and the memory should be
ignored. The original code had the EFI_MEMORY_WB check already, so it
seems prudent to keep it.

-hpa

2009-06-17 01:45:11

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH] x86: efi/e820 table merge fix

On Wed, 2009-06-17 at 09:38 +0800, H. Peter Anvin wrote:
> Huang Ying wrote:
> > On Wed, 2009-06-17 at 05:43 +0800, Cliff Wickman wrote:
> >> From: Cliff Wickman <[email protected]>
> >> --- linux.orig/arch/x86/kernel/efi.c
> >> +++ linux/arch/x86/kernel/efi.c
> >> @@ -240,10 +240,35 @@ static void __init do_add_efi_memmap(voi
> >> unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
> >> int e820_type;
> >>
> >> - if (md->attribute & EFI_MEMORY_WB)
> >> - e820_type = E820_RAM;
> >> - else
> >> + switch (md->type) {
> >> + case EFI_LOADER_CODE:
> >> + case EFI_LOADER_DATA:
> >> + case EFI_BOOT_SERVICES_CODE:
> >> + case EFI_BOOT_SERVICES_DATA:
> >> + case EFI_CONVENTIONAL_MEMORY:
> >> + if (md->attribute & EFI_MEMORY_WB)
> >> + e820_type = E820_RAM;
> >> + else
> >> + e820_type = E820_RESERVED;
> >> + break;
> >
> > Why does BIOS mark memory region without EFI_MEMORY_WB as these types?
> > Any example?
> >
> Probably not, but if it does, it's broken, and the memory should be
> ignored. The original code had the EFI_MEMORY_WB check already, so it
> seems prudent to keep it.

Maybe we need a real life example for that "fix". And attribute that to
the vendor in comments.

Best Regards,
Huang Ying

2009-06-17 04:04:59

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86: efi/e820 table merge fix

Huang Ying wrote:
>>> Why does BIOS mark memory region without EFI_MEMORY_WB as these types?
>>> Any example?
>>>
>> Probably not, but if it does, it's broken, and the memory should be
>> ignored. The original code had the EFI_MEMORY_WB check already, so it
>> seems prudent to keep it.
>
> Maybe we need a real life example for that "fix". And attribute that to
> the vendor in comments.
>
> Best Regards,
> Huang Ying

I think you're reading the patch backwards.

Before the patch, the EFI code didn't look at the type *AT ALL*, it only
looked at the EFI_MEMORY_WB attribute. This broke for SGI when they
were -- correctly -- reserving real memory (and hence still
EFI_MEMORY_WB) with the type set to EFI_RESERVED_TYPE. This is correct
behavior, but the old code saw that it was EFI_MEMORY_WB and therefore
considered it usable RAM. This is obviously broken.

Now why, you're asking, do we still look at md->attribute at all?
That's where caution dictates that it is prudent to diverge from the
previous behavior, but it is not *this* patch that should be the source
of that question, but from the author of the existing code, which
appears to be Paul Jackson of SGI. Unfortunately, his email now bounces
and noone has that information.

If you think about it, though, we don't want to consider it as usable
RAM if it isn't EFI_MEMORY_WB, and it would in fact be a bug (or
workaround for a broken system) to ignore it. In fact, we go through
great pains elsewhere in the kernel to remove memory which isn't WB from
the usable pool.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-06-17 05:08:35

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH] x86: efi/e820 table merge fix

On Wed, 2009-06-17 at 12:03 +0800, H. Peter Anvin wrote:
> Huang Ying wrote:
> >>> Why does BIOS mark memory region without EFI_MEMORY_WB as these types?
> >>> Any example?
> >>>
> >> Probably not, but if it does, it's broken, and the memory should be
> >> ignored. The original code had the EFI_MEMORY_WB check already, so it
> >> seems prudent to keep it.
> >
> > Maybe we need a real life example for that "fix". And attribute that to
> > the vendor in comments.
> >
> > Best Regards,
> > Huang Ying
>
> I think you're reading the patch backwards.
>
> Before the patch, the EFI code didn't look at the type *AT ALL*, it only
> looked at the EFI_MEMORY_WB attribute. This broke for SGI when they
> were -- correctly -- reserving real memory (and hence still
> EFI_MEMORY_WB) with the type set to EFI_RESERVED_TYPE. This is correct
> behavior, but the old code saw that it was EFI_MEMORY_WB and therefore
> considered it usable RAM. This is obviously broken.
>
> Now why, you're asking, do we still look at md->attribute at all?
> That's where caution dictates that it is prudent to diverge from the
> previous behavior, but it is not *this* patch that should be the source
> of that question, but from the author of the existing code, which
> appears to be Paul Jackson of SGI. Unfortunately, his email now bounces
> and noone has that information.

Yes. You are right. Thank you for your patient.

> If you think about it, though, we don't want to consider it as usable
> RAM if it isn't EFI_MEMORY_WB, and it would in fact be a bug (or
> workaround for a broken system) to ignore it. In fact, we go through
> great pains elsewhere in the kernel to remove memory which isn't WB from
> the usable pool.

Because it appears that checking EFI_MEMORY_WB is not necessary, maybe
it is necessary to add some comments about why it is checked to prevent
it to be deleted later.

Best Regards,
Huang Ying

2009-06-17 14:57:31

by Cliff Wickman

[permalink] [raw]
Subject: Re: [PATCH] x86: efi/e820 table merge fix

On Wed, Jun 17, 2009 at 01:08:22PM +0800, Huang Ying wrote:
> On Wed, 2009-06-17 at 12:03 +0800, H. Peter Anvin wrote:
> > Huang Ying wrote:
> > >>> Why does BIOS mark memory region without EFI_MEMORY_WB as these types?
> > >>> Any example?
> > >>>
> > >> Probably not, but if it does, it's broken, and the memory should be
> > >> ignored. The original code had the EFI_MEMORY_WB check already, so it
> > >> seems prudent to keep it.
> > >
> > > Maybe we need a real life example for that "fix". And attribute that to
> > > the vendor in comments.
> > >
> > > Best Regards,
> > > Huang Ying
> >
> > I think you're reading the patch backwards.
> >
> > Before the patch, the EFI code didn't look at the type *AT ALL*, it only
> > looked at the EFI_MEMORY_WB attribute. This broke for SGI when they
> > were -- correctly -- reserving real memory (and hence still
> > EFI_MEMORY_WB) with the type set to EFI_RESERVED_TYPE. This is correct
> > behavior, but the old code saw that it was EFI_MEMORY_WB and therefore
> > considered it usable RAM. This is obviously broken.
> >
> > Now why, you're asking, do we still look at md->attribute at all?
> > That's where caution dictates that it is prudent to diverge from the
> > previous behavior, but it is not *this* patch that should be the source
> > of that question, but from the author of the existing code, which
> > appears to be Paul Jackson of SGI. Unfortunately, his email now bounces
> > and noone has that information.
>
> Yes. You are right. Thank you for your patient.
>
> > If you think about it, though, we don't want to consider it as usable
> > RAM if it isn't EFI_MEMORY_WB, and it would in fact be a bug (or
> > workaround for a broken system) to ignore it. In fact, we go through
> > great pains elsewhere in the kernel to remove memory which isn't WB from
> > the usable pool.
>
> Because it appears that checking EFI_MEMORY_WB is not necessary, maybe
> it is necessary to add some comments about why it is checked to prevent
> it to be deleted later.

Paul Jackson retired from SGI a while back. I haven't seen him
participating in the LKML. But he must have been trying to assure
that, as Peter says, memory that isn't WB doesn't get into the usable
pool.

I think we are in agreement. I propose the below, with the comment about WB.


---
arch/x86/kernel/efi.c | 35 ++++++++++++++++++++++++++++++++---
1 file changed, 32 insertions(+), 3 deletions(-)

Index: linux/arch/x86/kernel/efi.c
===================================================================
--- linux.orig/arch/x86/kernel/efi.c
+++ linux/arch/x86/kernel/efi.c
@@ -240,10 +240,39 @@ static void __init do_add_efi_memmap(voi
unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
int e820_type;

- if (md->attribute & EFI_MEMORY_WB)
- e820_type = E820_RAM;
- else
+ switch (md->type) {
+ case EFI_LOADER_CODE:
+ case EFI_LOADER_DATA:
+ case EFI_BOOT_SERVICES_CODE:
+ case EFI_BOOT_SERVICES_DATA:
+ case EFI_CONVENTIONAL_MEMORY:
+ /*
+ * make sure that memory that is not write-back does
+ * not enter the usable memory pool
+ */
+ if (md->attribute & EFI_MEMORY_WB)
+ e820_type = E820_RAM;
+ else
+ e820_type = E820_RESERVED;
+ break;
+ case EFI_ACPI_RECLAIM_MEMORY:
+ e820_type = E820_ACPI;
+ break;
+ case EFI_ACPI_MEMORY_NVS:
+ e820_type = E820_NVS;
+ break;
+ case EFI_UNUSABLE_MEMORY:
+ e820_type = E820_UNUSABLE;
+ break;
+ default:
+ /*
+ * EFI_RESERVED_TYPE EFI_RUNTIME_SERVICES_CODE
+ * EFI_RUNTIME_SERVICES_DATA EFI_MEMORY_MAPPED_IO
+ * EFI_MEMORY_MAPPED_IO_PORT_SPACE EFI_PAL_CODE
+ */
e820_type = E820_RESERVED;
+ break;
+ }
e820_add_region(start, size, e820_type);
}
sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);

--
Cliff Wickman
SGI
[email protected]
(651) 683-3824

2009-06-17 18:30:02

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86: efi/e820 table merge fix

Cliff Wickman wrote:
>
> I think we are in agreement. I propose the below, with the comment about WB.
>
>
> ---
> arch/x86/kernel/efi.c | 35 ++++++++++++++++++++++++++++++++---
> 1 file changed, 32 insertions(+), 3 deletions(-)
>

Can you resubmit as an incremental patch instead?

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-06-17 18:31:52

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86: efi/e820 table merge fix

Cliff Wickman wrote:
>
> Paul Jackson retired from SGI a while back. I haven't seen him
> participating in the LKML. But he must have been trying to assure
> that, as Peter says, memory that isn't WB doesn't get into the usable
> pool.
>

Actually, it's not certain; he might just (incorrectly) have thought it
was easier than looking at the memory type. Either way, though, keeping
the check in is the conservative thing to do.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.