Subject: arm64/efi handling of persistent memory

Similar to the questions about the arm64 efi boot stub
handing persistent memory, some of the arm64 kernel code
looks fishy.

In arch/arm64/kernel/efi.c:

static int __init is_normal_ram(efi_memory_desc_t *md)
{
if (md->attribute & EFI_MEMORY_WB)
return 1;
return 0;
}

static __init int is_reserve_region(efi_memory_desc_t *md)
{
switch (md->type) {
case EFI_LOADER_CODE:
case EFI_LOADER_DATA:
case EFI_BOOT_SERVICES_CODE:
case EFI_BOOT_SERVICES_DATA:
case EFI_CONVENTIONAL_MEMORY:
case EFI_PERSISTENT_MEMORY:
return 0;
default:
break;
}
return is_normal_ram(md);
}

static __init void reserve_regions(void)
{
...
if (is_normal_ram(md))
early_init_dt_add_memory_arch(paddr, size);

if (is_reserve_region(md)) {
memblock_reserve(paddr, size);
...

static bool __init efi_virtmap_init(void)
{
...
if (!is_normal_ram(md))
prot = __pgprot(PROT_DEVICE_nGnRE);
else if (md->type == EFI_RUNTIME_SERVICES_CODE ||
!PAGE_ALIGNED(md->phys_addr))
prot = PAGE_KERNEL_EXEC;
else
prot = PAGE_KERNEL;

Concerns include:

1. is_normal_ram() will see the WB bit and return 1 regardless
of the type. That seems similar to the arm EFI boot stub issue.
The three callers are shown above.

2. is_reserve_region() treating EFI_PERSISTENT_MEMORY the same
as EFI_CONVENTIONAL_MEMORY looks wrong.

3. We're contemplating working around the grub problem by
reporting EFI_RESERVED_MEMORY plus the NV attribute rather
than EFI_PERSISTENT_MEMORY.

If this is done, then is_reserve_region() will fall through
to is_normal_ram(), which will see the WB bit and return 1.
That seems backwards... but seems correct for persistent
memory, reporting it as a reserved region. That might avoid the
the EFI_PERSISTENT_MEMORY handling problem (if the preceding
call to is_normal_ram() didn't already cause problems).

---
Robert Elliott, HPE Persistent Memory


2015-12-18 11:07:04

by Leif Lindholm

[permalink] [raw]
Subject: Re: arm64/efi handling of persistent memory

On Fri, Dec 18, 2015 at 01:33:25AM +0000, Elliott, Robert (Persistent Memory) wrote:
> Similar to the questions about the arm64 efi boot stub
> handing persistent memory, some of the arm64 kernel code
> looks fishy.
>
> In arch/arm64/kernel/efi.c:
>
> static int __init is_normal_ram(efi_memory_desc_t *md)
> {
> if (md->attribute & EFI_MEMORY_WB)
> return 1;
> return 0;
> }
>
> static __init int is_reserve_region(efi_memory_desc_t *md)
> {
> switch (md->type) {
> case EFI_LOADER_CODE:
> case EFI_LOADER_DATA:
> case EFI_BOOT_SERVICES_CODE:
> case EFI_BOOT_SERVICES_DATA:
> case EFI_CONVENTIONAL_MEMORY:
> case EFI_PERSISTENT_MEMORY:
> return 0;
> default:
> break;
> }
> return is_normal_ram(md);
> }
>
> static __init void reserve_regions(void)
> {
> ...
> if (is_normal_ram(md))
> early_init_dt_add_memory_arch(paddr, size);
>
> if (is_reserve_region(md)) {
> memblock_reserve(paddr, size);
> ...
>
> static bool __init efi_virtmap_init(void)
> {
> ...
> if (!is_normal_ram(md))
> prot = __pgprot(PROT_DEVICE_nGnRE);
> else if (md->type == EFI_RUNTIME_SERVICES_CODE ||
> !PAGE_ALIGNED(md->phys_addr))
> prot = PAGE_KERNEL_EXEC;
> else
> prot = PAGE_KERNEL;
>
> Concerns include:
>
> 1. is_normal_ram() will see the WB bit and return 1 regardless
> of the type. That seems similar to the arm EFI boot stub issue.
> The three callers are shown above.

So, first and third cases look OK to me, but the bit where we add
things to memblock just for being WB is bogus.

> 2. is_reserve_region() treating EFI_PERSISTENT_MEMORY the same
> as EFI_CONVENTIONAL_MEMORY looks wrong.

Yeah... That one was introduced by
ad5fb870c486 ("e820, efi: add ACPI 6.0 persistent memory types")
without any ACKs from ARM people :/

While it probably wouldn't wreck your system, it is unlikely to do
what you'd want.

> 3. We're contemplating working around the grub problem by
> reporting EFI_RESERVED_MEMORY plus the NV attribute rather
> than EFI_PERSISTENT_MEMORY.

That sounds a bit ... nuclear.
Would you then be expecting to retreive information about the NV
device out of hw description, or via PCI, rather than the UEFI memory
map?

> If this is done, then is_reserve_region() will fall through
> to is_normal_ram(), which will see the WB bit and return 1.
> That seems backwards... but seems correct for persistent
> memory, reporting it as a reserved region. That might avoid the
> the EFI_PERSISTENT_MEMORY handling problem (if the preceding
> call to is_normal_ram() didn't already cause problems).

So ... the code is convoluted and could probably do with a
refresh. But is_normal_ram() returning 1 means is_reserve_region()
will return 1, meaning we end up reserving it in memblock and not
allocating in it.

However, this is for is_reserve_region() - we will still have added it
to memblock with early_init_dt_add_memory_arch(), which may have
unwanted side effects.

I thought Ard had some patches in flight to address this, but they
don't appear to be in yet.

/
Leif

2015-12-18 11:46:05

by Mark Rutland

[permalink] [raw]
Subject: Re: arm64/efi handling of persistent memory

On Fri, Dec 18, 2015 at 11:06:51AM +0000, Leif Lindholm wrote:
> On Fri, Dec 18, 2015 at 01:33:25AM +0000, Elliott, Robert (Persistent Memory) wrote:
> > Similar to the questions about the arm64 efi boot stub
> > handing persistent memory, some of the arm64 kernel code
> > looks fishy.
> >
> > In arch/arm64/kernel/efi.c:
> >
> > static int __init is_normal_ram(efi_memory_desc_t *md)
> > {
> > if (md->attribute & EFI_MEMORY_WB)
> > return 1;
> > return 0;
> > }
> >
> > static __init int is_reserve_region(efi_memory_desc_t *md)
> > {
> > switch (md->type) {
> > case EFI_LOADER_CODE:
> > case EFI_LOADER_DATA:
> > case EFI_BOOT_SERVICES_CODE:
> > case EFI_BOOT_SERVICES_DATA:
> > case EFI_CONVENTIONAL_MEMORY:
> > case EFI_PERSISTENT_MEMORY:
> > return 0;
> > default:
> > break;
> > }
> > return is_normal_ram(md);
> > }
> >
> > static __init void reserve_regions(void)
> > {
> > ...
> > if (is_normal_ram(md))
> > early_init_dt_add_memory_arch(paddr, size);
> >
> > if (is_reserve_region(md)) {
> > memblock_reserve(paddr, size);
> > ...
> >
> > static bool __init efi_virtmap_init(void)
> > {
> > ...
> > if (!is_normal_ram(md))
> > prot = __pgprot(PROT_DEVICE_nGnRE);
> > else if (md->type == EFI_RUNTIME_SERVICES_CODE ||
> > !PAGE_ALIGNED(md->phys_addr))
> > prot = PAGE_KERNEL_EXEC;
> > else
> > prot = PAGE_KERNEL;
> >
> > Concerns include:
> >
> > 1. is_normal_ram() will see the WB bit and return 1 regardless
> > of the type. That seems similar to the arm EFI boot stub issue.
> > The three callers are shown above.
>
> So, first and third cases look OK to me, but the bit where we add
> things to memblock just for being WB is bogus.

We should be able to do this much more simply by only ever adding what
we know is safe. We can "reserve" portions by nomapping them. e.g.

bool can_add_to_memblock(efi_memory_desc_t *md)
{
if (!(md->attribute & EFI_MEMORY_WB))
return false;

switch (md->type) {
case EFI_LOADER_CODE:
case EFI_LOADER_DATA:
case EFI_BOOT_SERVICES_CODE:
case EFI_BOOT_SERVICES_DATA:
case EFI_CONVENTIONAL_MEMORY:
return true;
}

return false;
}

for_each_efi_memory_desc(&memmap, md) {
if (can_add_to_memblock(md))
early_init_dt_add_memory_arch(...);
else
memblock_mark_nomap(...);
}

Though I suspect Ard might know some reason that won't work.

> > 2. is_reserve_region() treating EFI_PERSISTENT_MEMORY the same
> > as EFI_CONVENTIONAL_MEMORY looks wrong.
>
> Yeah... That one was introduced by
> ad5fb870c486 ("e820, efi: add ACPI 6.0 persistent memory types")
> without any ACKs from ARM people :/
>
> While it probably wouldn't wreck your system, it is unlikely to do
> what you'd want.

It might corrupt that data you wanted to persist, so it's definitely a
bug. Severity is arguable.

> > 3. We're contemplating working around the grub problem by
> > reporting EFI_RESERVED_MEMORY plus the NV attribute rather
> > than EFI_PERSISTENT_MEMORY.

I'm missing something. Robert, do you mean GRUB has a similar bug?

> That sounds a bit ... nuclear.
> Would you then be expecting to retreive information about the NV
> device out of hw description, or via PCI, rather than the UEFI memory
> map?

That's going to cause much more pain. We should fix this code.

> > If this is done, then is_reserve_region() will fall through
> > to is_normal_ram(), which will see the WB bit and return 1.
> > That seems backwards... but seems correct for persistent
> > memory, reporting it as a reserved region. That might avoid the
> > the EFI_PERSISTENT_MEMORY handling problem (if the preceding
> > call to is_normal_ram() didn't already cause problems).
>
> So ... the code is convoluted and could probably do with a
> refresh. But is_normal_ram() returning 1 means is_reserve_region()
> will return 1, meaning we end up reserving it in memblock and not
> allocating in it.
>
> However, this is for is_reserve_region() - we will still have added it
> to memblock with early_init_dt_add_memory_arch(), which may have
> unwanted side effects.

See above for a suggestion on that front.

> I thought Ard had some patches in flight to address this, but they
> don't appear to be in yet.

The nomap stuff is in the arm64 git repo, in aarch64/efi [1] and
for-next/core [2]. That alone doesn't seem to address this, though.

Thanks,
Mark.

[1] https://git.kernel.org/cgit/linux/kernel/git/arm64/linux.git/log/?h=aarch64/efi
[2] https://git.kernel.org/cgit/linux/kernel/git/arm64/linux.git/log/?h=for-next/core

2015-12-18 11:52:41

by Mark Rutland

[permalink] [raw]
Subject: Re: arm64/efi handling of persistent memory

On Fri, Dec 18, 2015 at 11:06:51AM +0000, Leif Lindholm wrote:
> On Fri, Dec 18, 2015 at 01:33:25AM +0000, Elliott, Robert (Persistent Memory) wrote:
> > Similar to the questions about the arm64 efi boot stub
> > handing persistent memory, some of the arm64 kernel code
> > looks fishy.

[...]

> > 2. is_reserve_region() treating EFI_PERSISTENT_MEMORY the same
> > as EFI_CONVENTIONAL_MEMORY looks wrong.
>
> Yeah... That one was introduced by
> ad5fb870c486 ("e820, efi: add ACPI 6.0 persistent memory types")
> without any ACKs from ARM people :/

Do we need to do anythign to avoid his kind of thing in future? e.g. a
MAINTAINERS patch for the ARM EFI bits?

Or do we just need to pay attention to linux-efi?

Mark.

2015-12-18 12:11:45

by Leif Lindholm

[permalink] [raw]
Subject: Re: arm64/efi handling of persistent memory

On Fri, Dec 18, 2015 at 11:52:24AM +0000, Mark Rutland wrote:
> On Fri, Dec 18, 2015 at 11:06:51AM +0000, Leif Lindholm wrote:
> > On Fri, Dec 18, 2015 at 01:33:25AM +0000, Elliott, Robert (Persistent Memory) wrote:
> > > Similar to the questions about the arm64 efi boot stub
> > > handing persistent memory, some of the arm64 kernel code
> > > looks fishy.
>
> [...]
>
> > > 2. is_reserve_region() treating EFI_PERSISTENT_MEMORY the same
> > > as EFI_CONVENTIONAL_MEMORY looks wrong.
> >
> > Yeah... That one was introduced by
> > ad5fb870c486 ("e820, efi: add ACPI 6.0 persistent memory types")
> > without any ACKs from ARM people :/
>
> Do we need to do anythign to avoid his kind of thing in future? e.g. a
> MAINTAINERS patch for the ARM EFI bits?

We've not needed that sort of thing in the past.
And running get_maintainer.pl on ad5fb870c486 does yield Catalin,
Will, Matt Fleming, you, Ard, me, lakml and linux-efi.

> Or do we just need to pay attention to linux-efi?

While I am certainly guilty of not watching that closely enough, I
can't actually find that patch in my linux-efi history. Or lakml.

/
Leif

2015-12-18 13:07:14

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: arm64/efi handling of persistent memory

On 18 December 2015 at 12:45, Mark Rutland <[email protected]> wrote:
> On Fri, Dec 18, 2015 at 11:06:51AM +0000, Leif Lindholm wrote:
>> On Fri, Dec 18, 2015 at 01:33:25AM +0000, Elliott, Robert (Persistent Memory) wrote:
>> > Similar to the questions about the arm64 efi boot stub
>> > handing persistent memory, some of the arm64 kernel code
>> > looks fishy.
>> >
>> > In arch/arm64/kernel/efi.c:
>> >
>> > static int __init is_normal_ram(efi_memory_desc_t *md)
>> > {
>> > if (md->attribute & EFI_MEMORY_WB)
>> > return 1;
>> > return 0;
>> > }
>> >
>> > static __init int is_reserve_region(efi_memory_desc_t *md)
>> > {
>> > switch (md->type) {
>> > case EFI_LOADER_CODE:
>> > case EFI_LOADER_DATA:
>> > case EFI_BOOT_SERVICES_CODE:
>> > case EFI_BOOT_SERVICES_DATA:
>> > case EFI_CONVENTIONAL_MEMORY:
>> > case EFI_PERSISTENT_MEMORY:
>> > return 0;
>> > default:
>> > break;
>> > }
>> > return is_normal_ram(md);
>> > }
>> >
>> > static __init void reserve_regions(void)
>> > {
>> > ...
>> > if (is_normal_ram(md))
>> > early_init_dt_add_memory_arch(paddr, size);
>> >
>> > if (is_reserve_region(md)) {
>> > memblock_reserve(paddr, size);
>> > ...
>> >
>> > static bool __init efi_virtmap_init(void)
>> > {
>> > ...
>> > if (!is_normal_ram(md))
>> > prot = __pgprot(PROT_DEVICE_nGnRE);
>> > else if (md->type == EFI_RUNTIME_SERVICES_CODE ||
>> > !PAGE_ALIGNED(md->phys_addr))
>> > prot = PAGE_KERNEL_EXEC;
>> > else
>> > prot = PAGE_KERNEL;
>> >
>> > Concerns include:
>> >
>> > 1. is_normal_ram() will see the WB bit and return 1 regardless
>> > of the type. That seems similar to the arm EFI boot stub issue.
>> > The three callers are shown above.
>>
>> So, first and third cases look OK to me, but the bit where we add
>> things to memblock just for being WB is bogus.
>
> We should be able to do this much more simply by only ever adding what
> we know is safe. We can "reserve" portions by nomapping them. e.g.
>
> bool can_add_to_memblock(efi_memory_desc_t *md)
> {
> if (!(md->attribute & EFI_MEMORY_WB))
> return false;
>
> switch (md->type) {
> case EFI_LOADER_CODE:
> case EFI_LOADER_DATA:
> case EFI_BOOT_SERVICES_CODE:
> case EFI_BOOT_SERVICES_DATA:
> case EFI_CONVENTIONAL_MEMORY:
> return true;
> }
>
> return false;
> }
>
> for_each_efi_memory_desc(&memmap, md) {
> if (can_add_to_memblock(md))
> early_init_dt_add_memory_arch(...);
> else
> memblock_mark_nomap(...);
> }
>
> Though I suspect Ard might know some reason that won't work.
>

Before we start hacking away at this at the arm64/EFI level, do we
have any documentation and/or consensus regarding how persistent
memory should be treated in the first place? Should it be covered by
memblock? Should it be covered by the linear mapping? Should it be
memblock_reserve()'d?

>> > 2. is_reserve_region() treating EFI_PERSISTENT_MEMORY the same
>> > as EFI_CONVENTIONAL_MEMORY looks wrong.
>>
>> Yeah... That one was introduced by
>> ad5fb870c486 ("e820, efi: add ACPI 6.0 persistent memory types")
>> without any ACKs from ARM people :/
>>
>> While it probably wouldn't wreck your system, it is unlikely to do
>> what you'd want.
>
> It might corrupt that data you wanted to persist, so it's definitely a
> bug. Severity is arguable.
>
>> > 3. We're contemplating working around the grub problem by
>> > reporting EFI_RESERVED_MEMORY plus the NV attribute rather
>> > than EFI_PERSISTENT_MEMORY.
>
> I'm missing something. Robert, do you mean GRUB has a similar bug?
>
>> That sounds a bit ... nuclear.
>> Would you then be expecting to retreive information about the NV
>> device out of hw description, or via PCI, rather than the UEFI memory
>> map?
>
> That's going to cause much more pain. We should fix this code.
>
>> > If this is done, then is_reserve_region() will fall through
>> > to is_normal_ram(), which will see the WB bit and return 1.
>> > That seems backwards... but seems correct for persistent
>> > memory, reporting it as a reserved region. That might avoid the
>> > the EFI_PERSISTENT_MEMORY handling problem (if the preceding
>> > call to is_normal_ram() didn't already cause problems).
>>
>> So ... the code is convoluted and could probably do with a
>> refresh. But is_normal_ram() returning 1 means is_reserve_region()
>> will return 1, meaning we end up reserving it in memblock and not
>> allocating in it.
>>
>> However, this is for is_reserve_region() - we will still have added it
>> to memblock with early_init_dt_add_memory_arch(), which may have
>> unwanted side effects.
>
> See above for a suggestion on that front.
>
>> I thought Ard had some patches in flight to address this, but they
>> don't appear to be in yet.
>
> The nomap stuff is in the arm64 git repo, in aarch64/efi [1] and
> for-next/core [2]. That alone doesn't seem to address this, though.
>
> Thanks,
> Mark.
>
> [1] https://git.kernel.org/cgit/linux/kernel/git/arm64/linux.git/log/?h=aarch64/efi
> [2] https://git.kernel.org/cgit/linux/kernel/git/arm64/linux.git/log/?h=for-next/core

2015-12-18 13:27:57

by Matt Fleming

[permalink] [raw]
Subject: Re: arm64/efi handling of persistent memory

On Fri, 18 Dec, at 11:52:24AM, Mark Rutland wrote:
> On Fri, Dec 18, 2015 at 11:06:51AM +0000, Leif Lindholm wrote:
> > On Fri, Dec 18, 2015 at 01:33:25AM +0000, Elliott, Robert (Persistent Memory) wrote:
> > > Similar to the questions about the arm64 efi boot stub
> > > handing persistent memory, some of the arm64 kernel code
> > > looks fishy.
>
> [...]
>
> > > 2. is_reserve_region() treating EFI_PERSISTENT_MEMORY the same
> > > as EFI_CONVENTIONAL_MEMORY looks wrong.
> >
> > Yeah... That one was introduced by
> > ad5fb870c486 ("e820, efi: add ACPI 6.0 persistent memory types")
> > without any ACKs from ARM people :/
>
> Do we need to do anythign to avoid his kind of thing in future? e.g. a
> MAINTAINERS patch for the ARM EFI bits?
>
> Or do we just need to pay attention to linux-efi?

It never hit linux-efi, and I wasn't Cc'd, which means the entries
that do exist in MAINTAINERS were ignored anyway.

Because what would usually happens in that situation is that I would
ask ARM people to chime in.

It looks like all the NVDIMM changes came via Dan's nvdimm tree.