2019-12-04 07:53:46

by Dave Young

[permalink] [raw]
Subject: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage

Michael Weiser reported he got below error during a kexec rebooting:
esrt: Unsupported ESRT version 2904149718861218184.

The ESRT memory stays in EFI boot services data, and it was reserved
in kernel via efi_mem_reserve(). The initial purpose of the reservation
is to reuse the EFI boot services data across kexec reboot. For example
the BGRT image data and some ESRT memory like Michael reported.

But although the memory is reserved it is not updated in X86 e820 table.
And kexec_file_load iterate system ram in io resource list to find places
for kernel, initramfs and other stuff. In Michael's case the kexec loaded
initramfs overwritten the ESRT memory and then the failure happened.

Since kexec_file_load depends on the e820 to be updated, just fix this
by updating the reserved EFI boot services memory as reserved type in e820.

Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are
bypassed in the reservation code path because they are assumed as reserved.
But the reservation is still needed for multiple kexec reboot.
And it is the only possible case we come here thus just drop the code
chunk then everything works without side effects.

On my machine the ESRT memory sits in an EFI runtime data range, it does
not trigger the problem, but I successfully tested with BGRT instead.
both kexec_load and kexec_file_load work and kdump works as well.

Signed-off-by: Dave Young <[email protected]>
---
arch/x86/platform/efi/quirks.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

--- linux-x86.orig/arch/x86/platform/efi/quirks.c
+++ linux-x86/arch/x86/platform/efi/quirks.c
@@ -260,10 +260,6 @@ void __init efi_arch_mem_reserve(phys_ad
return;
}

- /* No need to reserve regions that will never be freed. */
- if (md.attribute & EFI_MEMORY_RUNTIME)
- return;
-
size += addr % EFI_PAGE_SIZE;
size = round_up(size, EFI_PAGE_SIZE);
addr = round_down(addr, EFI_PAGE_SIZE);
@@ -293,6 +289,8 @@ void __init efi_arch_mem_reserve(phys_ad
early_memunmap(new, new_size);

efi_memmap_install(new_phys, num_entries);
+ e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
+ e820__update_table(e820_table);
}

/*


2019-12-04 08:01:22

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage

On 12/04/19 at 03:52pm, Dave Young wrote:
> Michael Weiser reported he got below error during a kexec rebooting:
> esrt: Unsupported ESRT version 2904149718861218184.
>
> The ESRT memory stays in EFI boot services data, and it was reserved
> in kernel via efi_mem_reserve(). The initial purpose of the reservation
> is to reuse the EFI boot services data across kexec reboot. For example
> the BGRT image data and some ESRT memory like Michael reported.
>
> But although the memory is reserved it is not updated in X86 e820 table.
> And kexec_file_load iterate system ram in io resource list to find places
> for kernel, initramfs and other stuff. In Michael's case the kexec loaded
> initramfs overwritten the ESRT memory and then the failure happened.

s/overwritten/overwrote :) If need a repost please let me know..

>
> Since kexec_file_load depends on the e820 to be updated, just fix this
> by updating the reserved EFI boot services memory as reserved type in e820.
>
> Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are
> bypassed in the reservation code path because they are assumed as reserved.
> But the reservation is still needed for multiple kexec reboot.
> And it is the only possible case we come here thus just drop the code
> chunk then everything works without side effects.
>
> On my machine the ESRT memory sits in an EFI runtime data range, it does
> not trigger the problem, but I successfully tested with BGRT instead.
> both kexec_load and kexec_file_load work and kdump works as well.
>
> Signed-off-by: Dave Young <[email protected]>
> ---
> arch/x86/platform/efi/quirks.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> --- linux-x86.orig/arch/x86/platform/efi/quirks.c
> +++ linux-x86/arch/x86/platform/efi/quirks.c
> @@ -260,10 +260,6 @@ void __init efi_arch_mem_reserve(phys_ad
> return;
> }
>
> - /* No need to reserve regions that will never be freed. */
> - if (md.attribute & EFI_MEMORY_RUNTIME)
> - return;
> -
> size += addr % EFI_PAGE_SIZE;
> size = round_up(size, EFI_PAGE_SIZE);
> addr = round_down(addr, EFI_PAGE_SIZE);
> @@ -293,6 +289,8 @@ void __init efi_arch_mem_reserve(phys_ad
> early_memunmap(new, new_size);
>
> efi_memmap_install(new_phys, num_entries);
> + e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
> + e820__update_table(e820_table);
> }
>
> /*

Michael, could you a one more test and provide a tested-by if it works
for you?

Thanks
Dave

2019-12-04 10:10:13

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage


* Dave Young <[email protected]> wrote:

> On 12/04/19 at 03:52pm, Dave Young wrote:
> > Michael Weiser reported he got below error during a kexec rebooting:
> > esrt: Unsupported ESRT version 2904149718861218184.
> >
> > The ESRT memory stays in EFI boot services data, and it was reserved
> > in kernel via efi_mem_reserve(). The initial purpose of the reservation
> > is to reuse the EFI boot services data across kexec reboot. For example
> > the BGRT image data and some ESRT memory like Michael reported.
> >
> > But although the memory is reserved it is not updated in X86 e820 table.
> > And kexec_file_load iterate system ram in io resource list to find places
> > for kernel, initramfs and other stuff. In Michael's case the kexec loaded
> > initramfs overwritten the ESRT memory and then the failure happened.
>
> s/overwritten/overwrote :) If need a repost please let me know..

No need, I've edited the typo. :)

Thanks,

Ingo

2019-12-04 10:15:02

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage


* Dave Young <[email protected]> wrote:

> On 12/04/19 at 03:52pm, Dave Young wrote:
> > Michael Weiser reported he got below error during a kexec rebooting:
> > esrt: Unsupported ESRT version 2904149718861218184.
> >
> > The ESRT memory stays in EFI boot services data, and it was reserved
> > in kernel via efi_mem_reserve(). The initial purpose of the reservation
> > is to reuse the EFI boot services data across kexec reboot. For example
> > the BGRT image data and some ESRT memory like Michael reported.
> >
> > But although the memory is reserved it is not updated in X86 e820 table.
> > And kexec_file_load iterate system ram in io resource list to find places
> > for kernel, initramfs and other stuff. In Michael's case the kexec loaded
> > initramfs overwritten the ESRT memory and then the failure happened.
>
> s/overwritten/overwrote :) If need a repost please let me know..
>
> >
> > Since kexec_file_load depends on the e820 to be updated, just fix this
> > by updating the reserved EFI boot services memory as reserved type in e820.
> >
> > Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are
> > bypassed in the reservation code path because they are assumed as reserved.
> > But the reservation is still needed for multiple kexec reboot.
> > And it is the only possible case we come here thus just drop the code
> > chunk then everything works without side effects.
> >
> > On my machine the ESRT memory sits in an EFI runtime data range, it does
> > not trigger the problem, but I successfully tested with BGRT instead.
> > both kexec_load and kexec_file_load work and kdump works as well.
> >
> > Signed-off-by: Dave Young <[email protected]>


So I edited this to:

From: Dave Young <[email protected]>

Michael Weiser reported he got this error during a kexec rebooting:

esrt: Unsupported ESRT version 2904149718861218184.

The ESRT memory stays in EFI boot services data, and it was reserved
in kernel via efi_mem_reserve(). The initial purpose of the reservation
is to reuse the EFI boot services data across kexec reboot. For example
the BGRT image data and some ESRT memory like Michael reported.

But although the memory is reserved it is not updated in the X86 E820 table,
and kexec_file_load() iterates system RAM in the IO resource list to find places
for kernel, initramfs and other stuff. In Michael's case the kexec loaded
initramfs overwrote the ESRT memory and then the failure happened.

Since kexec_file_load() depends on the E820 table being updated, just fix this
by updating the reserved EFI boot services memory as reserved type in E820.

Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are
bypassed in the reservation code path because they are assumed as reserved.

But the reservation is still needed for multiple kexec reboots,
and it is the only possible case we come here thus just drop the code
chunk, then everything works without side effects.

On my machine the ESRT memory sits in an EFI runtime data range, it does
not trigger the problem, but I successfully tested with BGRT instead.
both kexec_load() and kexec_file_load() work and kdump works as well.

Thanks,

Ingo

Subject: [tip: x86/urgent] x86/efi: Update e820 with reserved EFI boot services data to fix kexec breakage

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: af164898482817a1d487964b68f3c21bae7a1beb
Gitweb: https://git.kernel.org/tip/af164898482817a1d487964b68f3c21bae7a1beb
Author: Dave Young <[email protected]>
AuthorDate: Wed, 04 Dec 2019 15:52:33 +08:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Wed, 04 Dec 2019 11:15:30 +01:00

x86/efi: Update e820 with reserved EFI boot services data to fix kexec breakage

Michael Weiser reported that he got this error during a kexec rebooting:

esrt: Unsupported ESRT version 2904149718861218184.

The ESRT memory stays in EFI boot services data, and it was reserved
in kernel via efi_mem_reserve(). The initial purpose of the reservation
is to reuse the EFI boot services data across kexec reboot. For example
the BGRT image data and some ESRT memory like Michael reported.

But although the memory is reserved it is not updated in the X86 E820 table,
and kexec_file_load() iterates system RAM in the IO resource list to find places
for kernel, initramfs and other stuff. In Michael's case the kexec loaded
initramfs overwrote the ESRT memory and then the failure happened.

Since kexec_file_load() depends on the E820 table being updated, just fix this
by updating the reserved EFI boot services memory as reserved type in E820.

Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are
bypassed in the reservation code path because they are assumed as reserved.

But the reservation is still needed for multiple kexec reboots,
and it is the only possible case we come here thus just drop the code
chunk, then everything works without side effects.

On my machine the ESRT memory sits in an EFI runtime data range, it does
not trigger the problem, but I successfully tested with BGRT instead.
both kexec_load() and kexec_file_load() work and kdump works as well.

[ mingo: Edited the changelog. ]

Reported-by: Michael Weiser <[email protected]>
Tested-by: Michael Weiser <[email protected]>
Signed-off-by: Dave Young <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/platform/efi/quirks.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 7675cf7..f8f0220 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -260,10 +260,6 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
return;
}

- /* No need to reserve regions that will never be freed. */
- if (md.attribute & EFI_MEMORY_RUNTIME)
- return;
-
size += addr % EFI_PAGE_SIZE;
size = round_up(size, EFI_PAGE_SIZE);
addr = round_down(addr, EFI_PAGE_SIZE);
@@ -293,6 +289,8 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
early_memunmap(new, new_size);

efi_memmap_install(new_phys, num_entries);
+ e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
+ e820__update_table(e820_table);
}

/*

2019-12-04 10:24:49

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage

On Wed, 4 Dec 2019 at 10:14, Ingo Molnar <[email protected]> wrote:
>
>
> * Dave Young <[email protected]> wrote:
>
> > On 12/04/19 at 03:52pm, Dave Young wrote:
> > > Michael Weiser reported he got below error during a kexec rebooting:
> > > esrt: Unsupported ESRT version 2904149718861218184.
> > >
> > > The ESRT memory stays in EFI boot services data, and it was reserved
> > > in kernel via efi_mem_reserve(). The initial purpose of the reservation
> > > is to reuse the EFI boot services data across kexec reboot. For example
> > > the BGRT image data and some ESRT memory like Michael reported.
> > >
> > > But although the memory is reserved it is not updated in X86 e820 table.
> > > And kexec_file_load iterate system ram in io resource list to find places
> > > for kernel, initramfs and other stuff. In Michael's case the kexec loaded
> > > initramfs overwritten the ESRT memory and then the failure happened.
> >
> > s/overwritten/overwrote :) If need a repost please let me know..
> >
> > >
> > > Since kexec_file_load depends on the e820 to be updated, just fix this
> > > by updating the reserved EFI boot services memory as reserved type in e820.
> > >
> > > Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are
> > > bypassed in the reservation code path because they are assumed as reserved.
> > > But the reservation is still needed for multiple kexec reboot.
> > > And it is the only possible case we come here thus just drop the code
> > > chunk then everything works without side effects.
> > >
> > > On my machine the ESRT memory sits in an EFI runtime data range, it does
> > > not trigger the problem, but I successfully tested with BGRT instead.
> > > both kexec_load and kexec_file_load work and kdump works as well.
> > >
> > > Signed-off-by: Dave Young <[email protected]>
>
>
> So I edited this to:
>
> From: Dave Young <[email protected]>
>
> Michael Weiser reported he got this error during a kexec rebooting:
>
> esrt: Unsupported ESRT version 2904149718861218184.
>
> The ESRT memory stays in EFI boot services data, and it was reserved
> in kernel via efi_mem_reserve(). The initial purpose of the reservation
> is to reuse the EFI boot services data across kexec reboot. For example
> the BGRT image data and some ESRT memory like Michael reported.
>
> But although the memory is reserved it is not updated in the X86 E820 table,
> and kexec_file_load() iterates system RAM in the IO resource list to find places
> for kernel, initramfs and other stuff. In Michael's case the kexec loaded
> initramfs overwrote the ESRT memory and then the failure happened.
>
> Since kexec_file_load() depends on the E820 table being updated, just fix this
> by updating the reserved EFI boot services memory as reserved type in E820.
>
> Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are
> bypassed in the reservation code path because they are assumed as reserved.
>
> But the reservation is still needed for multiple kexec reboots,
> and it is the only possible case we come here thus just drop the code
> chunk, then everything works without side effects.
>
> On my machine the ESRT memory sits in an EFI runtime data range, it does
> not trigger the problem, but I successfully tested with BGRT instead.
> both kexec_load() and kexec_file_load() work and kdump works as well.
>

Acked-by: Ard Biesheuvel <[email protected]>

2019-12-04 11:47:29

by Michael Weiser

[permalink] [raw]
Subject: Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage

Hello Dave,

On Wed, Dec 04, 2019 at 03:59:17PM +0800, Dave Young wrote:
> > Signed-off-by: Dave Young <[email protected]>
> > ---
> > arch/x86/platform/efi/quirks.c | 6 ++----
> > 1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > --- linux-x86.orig/arch/x86/platform/efi/quirks.c
> > +++ linux-x86/arch/x86/platform/efi/quirks.c
> > @@ -260,10 +260,6 @@ void __init efi_arch_mem_reserve(phys_ad
> > return;
> > }
> >
> > - /* No need to reserve regions that will never be freed. */
> > - if (md.attribute & EFI_MEMORY_RUNTIME)
> > - return;
> > -
> > size += addr % EFI_PAGE_SIZE;
> > size = round_up(size, EFI_PAGE_SIZE);
> > addr = round_down(addr, EFI_PAGE_SIZE);
> > @@ -293,6 +289,8 @@ void __init efi_arch_mem_reserve(phys_ad
> > early_memunmap(new, new_size);
> >
> > efi_memmap_install(new_phys, num_entries);
> > + e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
> > + e820__update_table(e820_table);
> > }
> >
> > /*
> Michael, could you a one more test and provide a tested-by if it works
> for you?

Did three successful kexecs in sequence of mainline 5.4.0 plus the patch
(had problems getting recent -next to boot on my machine). ESRT region
stayed reserved and intact so that the "Invalid version" error message
is gone.

Tested-by: Michael Weiser <[email protected]>
--
Thanks!
Michael

2019-12-05 10:56:46

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage

On 12/04/19 at 11:14am, Ingo Molnar wrote:
>
> * Dave Young <[email protected]> wrote:
>
> > On 12/04/19 at 03:52pm, Dave Young wrote:
> > > Michael Weiser reported he got below error during a kexec rebooting:
> > > esrt: Unsupported ESRT version 2904149718861218184.
> > >
> > > The ESRT memory stays in EFI boot services data, and it was reserved
> > > in kernel via efi_mem_reserve(). The initial purpose of the reservation
> > > is to reuse the EFI boot services data across kexec reboot. For example
> > > the BGRT image data and some ESRT memory like Michael reported.
> > >
> > > But although the memory is reserved it is not updated in X86 e820 table.
> > > And kexec_file_load iterate system ram in io resource list to find places
> > > for kernel, initramfs and other stuff. In Michael's case the kexec loaded
> > > initramfs overwritten the ESRT memory and then the failure happened.
> >
> > s/overwritten/overwrote :) If need a repost please let me know..
> >
> > >
> > > Since kexec_file_load depends on the e820 to be updated, just fix this
> > > by updating the reserved EFI boot services memory as reserved type in e820.
> > >
> > > Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are
> > > bypassed in the reservation code path because they are assumed as reserved.
> > > But the reservation is still needed for multiple kexec reboot.
> > > And it is the only possible case we come here thus just drop the code
> > > chunk then everything works without side effects.
> > >
> > > On my machine the ESRT memory sits in an EFI runtime data range, it does
> > > not trigger the problem, but I successfully tested with BGRT instead.
> > > both kexec_load and kexec_file_load work and kdump works as well.
> > >
> > > Signed-off-by: Dave Young <[email protected]>
>
>
> So I edited this to:
>
> From: Dave Young <[email protected]>
>
> Michael Weiser reported he got this error during a kexec rebooting:
>
> esrt: Unsupported ESRT version 2904149718861218184.
>
> The ESRT memory stays in EFI boot services data, and it was reserved
> in kernel via efi_mem_reserve(). The initial purpose of the reservation
> is to reuse the EFI boot services data across kexec reboot. For example
> the BGRT image data and some ESRT memory like Michael reported.
>
> But although the memory is reserved it is not updated in the X86 E820 table,
> and kexec_file_load() iterates system RAM in the IO resource list to find places
> for kernel, initramfs and other stuff. In Michael's case the kexec loaded
> initramfs overwrote the ESRT memory and then the failure happened.
>
> Since kexec_file_load() depends on the E820 table being updated, just fix this
> by updating the reserved EFI boot services memory as reserved type in E820.
>
> Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are
> bypassed in the reservation code path because they are assumed as reserved.
>
> But the reservation is still needed for multiple kexec reboots,
> and it is the only possible case we come here thus just drop the code
> chunk, then everything works without side effects.
>
> On my machine the ESRT memory sits in an EFI runtime data range, it does
> not trigger the problem, but I successfully tested with BGRT instead.
> both kexec_load() and kexec_file_load() work and kdump works as well.
>

Thanks for the amending, also thank all for the review and test.

Dave

2019-12-05 21:32:33

by Michael Weiser

[permalink] [raw]
Subject: Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage

On Thu, Dec 05, 2019 at 06:55:45PM +0800, Dave Young wrote:

> > esrt: Unsupported ESRT version 2904149718861218184.
> >
> > The ESRT memory stays in EFI boot services data, and it was reserved
> > in kernel via efi_mem_reserve(). The initial purpose of the reservation
> > is to reuse the EFI boot services data across kexec reboot. For example
> > the BGRT image data and some ESRT memory like Michael reported.
> >
> > But although the memory is reserved it is not updated in the X86 E820 table,
> > and kexec_file_load() iterates system RAM in the IO resource list to find places
> > for kernel, initramfs and other stuff. In Michael's case the kexec loaded
> > initramfs overwrote the ESRT memory and then the failure happened.
> >
> > Since kexec_file_load() depends on the E820 table being updated, just fix this
> > by updating the reserved EFI boot services memory as reserved type in E820.
> Thanks for the amending, also thank all for the review and test.

Same from me, particularly everyone's patience with my haphazard
guesswork around an area I clearly know nothing about. :)
--
Thanks,
Michael

2019-12-28 20:55:12

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage

On Tue, Dec 3, 2019 at 11:53 PM Dave Young <[email protected]> wrote:
>
> Michael Weiser reported he got below error during a kexec rebooting:
> esrt: Unsupported ESRT version 2904149718861218184.
>
> The ESRT memory stays in EFI boot services data, and it was reserved
> in kernel via efi_mem_reserve(). The initial purpose of the reservation
> is to reuse the EFI boot services data across kexec reboot. For example
> the BGRT image data and some ESRT memory like Michael reported.
>
> But although the memory is reserved it is not updated in X86 e820 table.
> And kexec_file_load iterate system ram in io resource list to find places
> for kernel, initramfs and other stuff. In Michael's case the kexec loaded
> initramfs overwritten the ESRT memory and then the failure happened.
>
> Since kexec_file_load depends on the e820 to be updated, just fix this
> by updating the reserved EFI boot services memory as reserved type in e820.
>
> Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are
> bypassed in the reservation code path because they are assumed as reserved.
> But the reservation is still needed for multiple kexec reboot.
> And it is the only possible case we come here thus just drop the code
> chunk then everything works without side effects.
>
> On my machine the ESRT memory sits in an EFI runtime data range, it does
> not trigger the problem, but I successfully tested with BGRT instead.
> both kexec_load and kexec_file_load work and kdump works as well.
>
> Signed-off-by: Dave Young <[email protected]>
> ---
> arch/x86/platform/efi/quirks.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> --- linux-x86.orig/arch/x86/platform/efi/quirks.c
> +++ linux-x86/arch/x86/platform/efi/quirks.c
> @@ -260,10 +260,6 @@ void __init efi_arch_mem_reserve(phys_ad
> return;
> }
>
> - /* No need to reserve regions that will never be freed. */
> - if (md.attribute & EFI_MEMORY_RUNTIME)
> - return;
> -
> size += addr % EFI_PAGE_SIZE;
> size = round_up(size, EFI_PAGE_SIZE);
> addr = round_down(addr, EFI_PAGE_SIZE);
> @@ -293,6 +289,8 @@ void __init efi_arch_mem_reserve(phys_ad
> early_memunmap(new, new_size);
>
> efi_memmap_install(new_phys, num_entries);
> + e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
> + e820__update_table(e820_table);
> }
>
> /*
>

Bisect says this change (commit af1648984828) is triggering a
regression, likely not urgent, in my testing of the new efi_fake_mem=
facility to allow memory to be marked "soft reserved" via the kernel
command line (commit 199c84717612 x86/efi: Add efi_fake_mem support
for EFI_MEMORY_SP). The following command line triggers the crash
signature below:

efi_fake_mem=4G@9G:0x40000,4G@13G:0x40000

However, this command line works ok:

efi_fake_mem=8G@9G:0x40000

So, something about multiple efi_fake_mem statements interacts badly
with this change. Nothing obvious occurs to me at the moment, I'll
keep debugging, but wanted to highlight this in the meantime in case
someone else sees a deeper issue or the root cause.

BUG: unable to handle page fault for address: ffffffffff281000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 188615067 P4D 188615067 PUD 188617067 PMD 188e4d067 PTE 0
Oops: 0002 [#1] SMP PTI
CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0+ #154
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
RIP: 0010:efi_memmap_insert+0xed/0x14b
Code: 48 89 48 18 49 39 d9 76 67 49 39 d1 73 62 4c 89 c9 48 2b 48 08
4c 89 c6 48 c1 e9 0c 48 89 48 18 49 8b 4a 28 48 01 c8 48 89 c7 <f3> a4
49 39 d3 73 2c 4c 89 48 08 4c 29 da 4c 89 c6 4c 89 68 18 48
RSP: 0000:ffffffffb7603d70 EFLAGS: 00010086
RAX: ffffffffff280ff0 RBX: 0000000000000000 RCX: 0000000000000020
RDX: ffffffffffffffff RSI: ffffffffff200fe0 RDI: ffffffffff281000
RBP: 00000000bea2d000 R08: ffffffffff200fd0 R09: 00000000bea06000
R10: ffffffffb77e1718 R11: 00000000bea2cfff R12: 800000000000000f
R13: 0000000000000027 R14: ffffffff415fa001 R15: 0000000000000ab0
FS: 0000000000000000(0000) GS:ffffffffb7c31000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff281000 CR3: 0000000188610000 CR4: 00000000000606b0
Call Trace:
? efi_arch_mem_reserve+0x149/0x1a6
? bgrt_init+0xbe/0xbe
? bgrt_init+0xbe/0xbe
? acpi_parse_bgrt+0xa/0xd
? acpi_table_parse+0x86/0xb8
? acpi_boot_init+0x494/0x4e3
? acpi_parse_x2apic+0x87/0x87
? setup_acpi_sci+0xa2/0xa2
? setup_arch+0x8db/0x9e1
? start_kernel+0x6a/0x547
? secondary_startup_64+0xb6/0xc0
Modules linked in:
CR2: ffffffffff281000
random: get_random_bytes called from print_oops_end_marker+0x26/0x40
with crng_init=0
---[ end trace 2acc14aa0990ee9d ]---
RIP: 0010:efi_memmap_insert+0xed/0x14b
Code: 48 89 48 18 49 39 d9 76 67 49 39 d1 73 62 4c 89 c9 48 2b 48 08
4c 89 c6 48 c1 e9 0c 48 89 48 18 49 8b 4a 28 48 01 c8 48 89 c7 <f3> a4
49 39 d3 73 2c 4c 89 48 08 4c 29 da 4c 89 c6 4c 89 68 18 48
RSP: 0000:ffffffffb7603d70 EFLAGS: 00010086
RAX: ffffffffff280ff0 RBX: 0000000000000000 RCX: 0000000000000020
RDX: ffffffffffffffff RSI: ffffffffff200fe0 RDI: ffffffffff281000
RBP: 00000000bea2d000 R08: ffffffffff200fd0 R09: 00000000bea06000
R10: ffffffffb77e1718 R11: 00000000bea2cfff R12: 800000000000000f
R13: 0000000000000027 R14: ffffffff415fa001 R15: 0000000000000ab0
FS: 0000000000000000(0000) GS:ffffffffb7c31000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff281000 CR3: 0000000188610000 CR4: 00000000000606b0
Kernel panic - not syncing: Fatal exception
---[ end Kernel panic - not syncing: Fatal exception ]---

2019-12-29 06:15:05

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage

On Sat, Dec 28, 2019 at 12:54 PM Dan Williams
<[email protected]> wrote:
>
> On Tue, Dec 3, 2019 at 11:53 PM Dave Young <[email protected]> wrote:
> >
> > Michael Weiser reported he got below error during a kexec rebooting:
> > esrt: Unsupported ESRT version 2904149718861218184.
> >
> > The ESRT memory stays in EFI boot services data, and it was reserved
> > in kernel via efi_mem_reserve(). The initial purpose of the reservation
> > is to reuse the EFI boot services data across kexec reboot. For example
> > the BGRT image data and some ESRT memory like Michael reported.
> >
> > But although the memory is reserved it is not updated in X86 e820 table.
> > And kexec_file_load iterate system ram in io resource list to find places
> > for kernel, initramfs and other stuff. In Michael's case the kexec loaded
> > initramfs overwritten the ESRT memory and then the failure happened.
> >
> > Since kexec_file_load depends on the e820 to be updated, just fix this
> > by updating the reserved EFI boot services memory as reserved type in e820.
> >
> > Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are
> > bypassed in the reservation code path because they are assumed as reserved.
> > But the reservation is still needed for multiple kexec reboot.
> > And it is the only possible case we come here thus just drop the code
> > chunk then everything works without side effects.
> >
> > On my machine the ESRT memory sits in an EFI runtime data range, it does
> > not trigger the problem, but I successfully tested with BGRT instead.
> > both kexec_load and kexec_file_load work and kdump works as well.
> >
> > Signed-off-by: Dave Young <[email protected]>
> > ---
> > arch/x86/platform/efi/quirks.c | 6 ++----
> > 1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > --- linux-x86.orig/arch/x86/platform/efi/quirks.c
> > +++ linux-x86/arch/x86/platform/efi/quirks.c
> > @@ -260,10 +260,6 @@ void __init efi_arch_mem_reserve(phys_ad
> > return;
> > }
> >
> > - /* No need to reserve regions that will never be freed. */
> > - if (md.attribute & EFI_MEMORY_RUNTIME)
> > - return;
> > -
> > size += addr % EFI_PAGE_SIZE;
> > size = round_up(size, EFI_PAGE_SIZE);
> > addr = round_down(addr, EFI_PAGE_SIZE);
> > @@ -293,6 +289,8 @@ void __init efi_arch_mem_reserve(phys_ad
> > early_memunmap(new, new_size);
> >
> > efi_memmap_install(new_phys, num_entries);
> > + e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
> > + e820__update_table(e820_table);
> > }
> >
> > /*
> >
>
> Bisect says this change (commit af1648984828) is triggering a
> regression, likely not urgent, in my testing of the new efi_fake_mem=
> facility to allow memory to be marked "soft reserved" via the kernel
> command line (commit 199c84717612 x86/efi: Add efi_fake_mem support
> for EFI_MEMORY_SP). The following command line triggers the crash
> signature below:
>
> efi_fake_mem=4G@9G:0x40000,4G@13G:0x40000
>
> However, this command line works ok:
>
> efi_fake_mem=8G@9G:0x40000
>
> So, something about multiple efi_fake_mem statements interacts badly
> with this change. Nothing obvious occurs to me at the moment, I'll
> keep debugging, but wanted to highlight this in the meantime in case
> someone else sees a deeper issue or the root cause.

Still looking, but this failure does not seem to be specific to the
"soft reservation" changes. Any update to the efi memmap that pushes
it over a page boundary triggers this failure. I.e. I can fix the
problem by over-allocating the efi memmap and then page aligning the
result. __early_ioremap "should" be handling this case, but it appears
something else is messing this up.

>
> BUG: unable to handle page fault for address: ffffffffff281000
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 188615067 P4D 188615067 PUD 188617067 PMD 188e4d067 PTE 0
> Oops: 0002 [#1] SMP PTI
> CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0+ #154
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> RIP: 0010:efi_memmap_insert+0xed/0x14b
> Code: 48 89 48 18 49 39 d9 76 67 49 39 d1 73 62 4c 89 c9 48 2b 48 08
> 4c 89 c6 48 c1 e9 0c 48 89 48 18 49 8b 4a 28 48 01 c8 48 89 c7 <f3> a4
> 49 39 d3 73 2c 4c 89 48 08 4c 29 da 4c 89 c6 4c 89 68 18 48
> RSP: 0000:ffffffffb7603d70 EFLAGS: 00010086
> RAX: ffffffffff280ff0 RBX: 0000000000000000 RCX: 0000000000000020
> RDX: ffffffffffffffff RSI: ffffffffff200fe0 RDI: ffffffffff281000
> RBP: 00000000bea2d000 R08: ffffffffff200fd0 R09: 00000000bea06000
> R10: ffffffffb77e1718 R11: 00000000bea2cfff R12: 800000000000000f
> R13: 0000000000000027 R14: ffffffff415fa001 R15: 0000000000000ab0
> FS: 0000000000000000(0000) GS:ffffffffb7c31000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffffffff281000 CR3: 0000000188610000 CR4: 00000000000606b0
> Call Trace:
> ? efi_arch_mem_reserve+0x149/0x1a6
> ? bgrt_init+0xbe/0xbe
> ? bgrt_init+0xbe/0xbe
> ? acpi_parse_bgrt+0xa/0xd
> ? acpi_table_parse+0x86/0xb8
> ? acpi_boot_init+0x494/0x4e3
> ? acpi_parse_x2apic+0x87/0x87
> ? setup_acpi_sci+0xa2/0xa2
> ? setup_arch+0x8db/0x9e1
> ? start_kernel+0x6a/0x547
> ? secondary_startup_64+0xb6/0xc0
> Modules linked in:
> CR2: ffffffffff281000
> random: get_random_bytes called from print_oops_end_marker+0x26/0x40
> with crng_init=0
> ---[ end trace 2acc14aa0990ee9d ]---
> RIP: 0010:efi_memmap_insert+0xed/0x14b
> Code: 48 89 48 18 49 39 d9 76 67 49 39 d1 73 62 4c 89 c9 48 2b 48 08
> 4c 89 c6 48 c1 e9 0c 48 89 48 18 49 8b 4a 28 48 01 c8 48 89 c7 <f3> a4
> 49 39 d3 73 2c 4c 89 48 08 4c 29 da 4c 89 c6 4c 89 68 18 48
> RSP: 0000:ffffffffb7603d70 EFLAGS: 00010086
> RAX: ffffffffff280ff0 RBX: 0000000000000000 RCX: 0000000000000020
> RDX: ffffffffffffffff RSI: ffffffffff200fe0 RDI: ffffffffff281000
> RBP: 00000000bea2d000 R08: ffffffffff200fd0 R09: 00000000bea06000
> R10: ffffffffb77e1718 R11: 00000000bea2cfff R12: 800000000000000f
> R13: 0000000000000027 R14: ffffffff415fa001 R15: 0000000000000ab0
> FS: 0000000000000000(0000) GS:ffffffffb7c31000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffffffff281000 CR3: 0000000188610000 CR4: 00000000000606b0
> Kernel panic - not syncing: Fatal exception
> ---[ end Kernel panic - not syncing: Fatal exception ]---

2019-12-29 14:26:11

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage

Hi Dan,
On 12/28/19 at 10:13pm, Dan Williams wrote:
> On Sat, Dec 28, 2019 at 12:54 PM Dan Williams
> <[email protected]> wrote:
> >
> > On Tue, Dec 3, 2019 at 11:53 PM Dave Young <[email protected]> wrote:
> > >
> > > Michael Weiser reported he got below error during a kexec rebooting:
> > > esrt: Unsupported ESRT version 2904149718861218184.
> > >
> > > The ESRT memory stays in EFI boot services data, and it was reserved
> > > in kernel via efi_mem_reserve(). The initial purpose of the reservation
> > > is to reuse the EFI boot services data across kexec reboot. For example
> > > the BGRT image data and some ESRT memory like Michael reported.
> > >
> > > But although the memory is reserved it is not updated in X86 e820 table.
> > > And kexec_file_load iterate system ram in io resource list to find places
> > > for kernel, initramfs and other stuff. In Michael's case the kexec loaded
> > > initramfs overwritten the ESRT memory and then the failure happened.
> > >
> > > Since kexec_file_load depends on the e820 to be updated, just fix this
> > > by updating the reserved EFI boot services memory as reserved type in e820.
> > >
> > > Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are
> > > bypassed in the reservation code path because they are assumed as reserved.
> > > But the reservation is still needed for multiple kexec reboot.
> > > And it is the only possible case we come here thus just drop the code
> > > chunk then everything works without side effects.
> > >
> > > On my machine the ESRT memory sits in an EFI runtime data range, it does
> > > not trigger the problem, but I successfully tested with BGRT instead.
> > > both kexec_load and kexec_file_load work and kdump works as well.
> > >
> > > Signed-off-by: Dave Young <[email protected]>
> > > ---
> > > arch/x86/platform/efi/quirks.c | 6 ++----
> > > 1 file changed, 2 insertions(+), 4 deletions(-)
> > >
> > > --- linux-x86.orig/arch/x86/platform/efi/quirks.c
> > > +++ linux-x86/arch/x86/platform/efi/quirks.c
> > > @@ -260,10 +260,6 @@ void __init efi_arch_mem_reserve(phys_ad
> > > return;
> > > }
> > >
> > > - /* No need to reserve regions that will never be freed. */
> > > - if (md.attribute & EFI_MEMORY_RUNTIME)
> > > - return;
> > > -
> > > size += addr % EFI_PAGE_SIZE;
> > > size = round_up(size, EFI_PAGE_SIZE);
> > > addr = round_down(addr, EFI_PAGE_SIZE);
> > > @@ -293,6 +289,8 @@ void __init efi_arch_mem_reserve(phys_ad
> > > early_memunmap(new, new_size);
> > >
> > > efi_memmap_install(new_phys, num_entries);
> > > + e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
> > > + e820__update_table(e820_table);
> > > }
> > >
> > > /*
> > >
> >
> > Bisect says this change (commit af1648984828) is triggering a
> > regression, likely not urgent, in my testing of the new efi_fake_mem=
> > facility to allow memory to be marked "soft reserved" via the kernel
> > command line (commit 199c84717612 x86/efi: Add efi_fake_mem support
> > for EFI_MEMORY_SP). The following command line triggers the crash
> > signature below:
> >
> > efi_fake_mem=4G@9G:0x40000,4G@13G:0x40000
> >
> > However, this command line works ok:
> >
> > efi_fake_mem=8G@9G:0x40000
> >
> > So, something about multiple efi_fake_mem statements interacts badly
> > with this change. Nothing obvious occurs to me at the moment, I'll
> > keep debugging, but wanted to highlight this in the meantime in case
> > someone else sees a deeper issue or the root cause.
>
> Still looking, but this failure does not seem to be specific to the
> "soft reservation" changes. Any update to the efi memmap that pushes
> it over a page boundary triggers this failure. I.e. I can fix the
> problem by over-allocating the efi memmap and then page aligning the
> result. __early_ioremap "should" be handling this case, but it appears
> something else is messing this up.

I seems can not reproduce the bug, but maybe my vm setup is different.
Can you do some debugging about the efi_memmap_insert function see if
something wrong happened, maybe happens when memcpy to some new allocated buffer via
memblock_alloc, just some guess.

Thanks
Dave

2019-12-30 03:33:23

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage

On 12/29/19 at 10:24pm, Dave Young wrote:
> Hi Dan,
> On 12/28/19 at 10:13pm, Dan Williams wrote:
> > On Sat, Dec 28, 2019 at 12:54 PM Dan Williams
> > <[email protected]> wrote:
> > >
> > > On Tue, Dec 3, 2019 at 11:53 PM Dave Young <[email protected]> wrote:
> > > >
> > > > Michael Weiser reported he got below error during a kexec rebooting:
> > > > esrt: Unsupported ESRT version 2904149718861218184.
> > > >
> > > > The ESRT memory stays in EFI boot services data, and it was reserved
> > > > in kernel via efi_mem_reserve(). The initial purpose of the reservation
> > > > is to reuse the EFI boot services data across kexec reboot. For example
> > > > the BGRT image data and some ESRT memory like Michael reported.
> > > >
> > > > But although the memory is reserved it is not updated in X86 e820 table.
> > > > And kexec_file_load iterate system ram in io resource list to find places
> > > > for kernel, initramfs and other stuff. In Michael's case the kexec loaded
> > > > initramfs overwritten the ESRT memory and then the failure happened.
> > > >
> > > > Since kexec_file_load depends on the e820 to be updated, just fix this
> > > > by updating the reserved EFI boot services memory as reserved type in e820.
> > > >
> > > > Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are
> > > > bypassed in the reservation code path because they are assumed as reserved.
> > > > But the reservation is still needed for multiple kexec reboot.
> > > > And it is the only possible case we come here thus just drop the code
> > > > chunk then everything works without side effects.
> > > >
> > > > On my machine the ESRT memory sits in an EFI runtime data range, it does
> > > > not trigger the problem, but I successfully tested with BGRT instead.
> > > > both kexec_load and kexec_file_load work and kdump works as well.
> > > >
> > > > Signed-off-by: Dave Young <[email protected]>
> > > > ---
> > > > arch/x86/platform/efi/quirks.c | 6 ++----
> > > > 1 file changed, 2 insertions(+), 4 deletions(-)
> > > >
> > > > --- linux-x86.orig/arch/x86/platform/efi/quirks.c
> > > > +++ linux-x86/arch/x86/platform/efi/quirks.c
> > > > @@ -260,10 +260,6 @@ void __init efi_arch_mem_reserve(phys_ad
> > > > return;
> > > > }
> > > >
> > > > - /* No need to reserve regions that will never be freed. */
> > > > - if (md.attribute & EFI_MEMORY_RUNTIME)
> > > > - return;
> > > > -
> > > > size += addr % EFI_PAGE_SIZE;
> > > > size = round_up(size, EFI_PAGE_SIZE);
> > > > addr = round_down(addr, EFI_PAGE_SIZE);
> > > > @@ -293,6 +289,8 @@ void __init efi_arch_mem_reserve(phys_ad
> > > > early_memunmap(new, new_size);
> > > >
> > > > efi_memmap_install(new_phys, num_entries);
> > > > + e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
> > > > + e820__update_table(e820_table);
> > > > }
> > > >
> > > > /*
> > > >
> > >
> > > Bisect says this change (commit af1648984828) is triggering a
> > > regression, likely not urgent, in my testing of the new efi_fake_mem=
> > > facility to allow memory to be marked "soft reserved" via the kernel
> > > command line (commit 199c84717612 x86/efi: Add efi_fake_mem support
> > > for EFI_MEMORY_SP). The following command line triggers the crash
> > > signature below:
> > >
> > > efi_fake_mem=4G@9G:0x40000,4G@13G:0x40000
> > >
> > > However, this command line works ok:
> > >
> > > efi_fake_mem=8G@9G:0x40000
> > >
> > > So, something about multiple efi_fake_mem statements interacts badly
> > > with this change. Nothing obvious occurs to me at the moment, I'll
> > > keep debugging, but wanted to highlight this in the meantime in case
> > > someone else sees a deeper issue or the root cause.
> >
> > Still looking, but this failure does not seem to be specific to the
> > "soft reservation" changes. Any update to the efi memmap that pushes
> > it over a page boundary triggers this failure. I.e. I can fix the
> > problem by over-allocating the efi memmap and then page aligning the
> > result. __early_ioremap "should" be handling this case, but it appears
> > something else is messing this up.
>
> I seems can not reproduce the bug, but maybe my vm setup is different.
> Can you do some debugging about the efi_memmap_insert function see if
> something wrong happened, maybe happens when memcpy to some new allocated buffer via
> memblock_alloc, just some guess.

I reproduced some other panics with or without my fix about the efi boot
mem, but not 100% reproducible although high likely. I'm using params
below:
efi_fake_mem=200M@5G:0x40000,300M@5600M:0x40000

I suspect efi_fake_mem needs more careful sanity checks about the memory
user provided, but I'm not very familiar with the details though..

The issues I notices are two different ones:

First one is a panic during booting:
[ 0.210239] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 0.215983] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 0.216835] #PF: supervisor write access in kernel mode
[ 0.217384] #PF: error_code(0x0002) - not-present page
[ 0.217976] PGD 0 P4D 0
[ 0.218248] Oops: 0002 [#1] SMP PTI
[ 0.218668] CPU: 0 PID: 0 Comm: swapper Not tainted 5.5.0-rc3+ #3
[ 0.219315] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 0.220191] RIP: 0010:__free_pages_ok+0x2de/0x5c0
[ 0.220690] Code: 00 00 4b 8d 04 a4 48 c1 e5 04 48 c1 e0 08 48 89 c1 48 01 c5 4b 8d 04 c0 48 c1 e0 03 48 01 c5 48 01 c8 49 8b b4 2e c0 00 00 00 <48> 89 7e 08 48 89 73 08 48 89 53 10 48 89 3a 49 83 84 06 00 01 00
[ 0.222843] RSP: 0000:ffffffff97e03e60 EFLAGS: 00010002
[ 0.223400] RAX: 00000000000007d0 RBX: ffffda4d00050000 RCX: 0000000000000500
[ 0.224200] RDX: ffffa2a9bd3fe900 RSI: 0000000000000000 RDI: ffffda4d00050008
[ 0.224984] RBP: 0000000000000840 R08: 000000000000000a R09: 0000000000000400
[ 0.225825] R10: dead000000000100 R11: 0000000000000039 R12: 0000000000000001
[ 0.226793] R13: ffffa2a9bd3fe500 R14: ffffa2a9bd3fe000 R15: 000000000000000a
[ 0.227764] FS: 0000000000000000(0000) GS:ffffa2a9b7600000(0000) knlGS:0000000000000000
[ 0.228799] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.229511] CR2: 0000000000000008 CR3: 00000001cf80a001 CR4: 00000000000606b0
[ 0.230323] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.231399] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.232219] Call Trace:
[ 0.232524] memblock_free_all+0x127/0x195
[ 0.233090] mem_init+0x15/0x9d
[ 0.233450] start_kernel+0x215/0x4e0
[ 0.233875] ? load_ucode_bsp+0x3e/0x11b
[ 0.234325] secondary_startup_64+0xa4/0xb0
[ 0.234833] Modules linked in:
[ 0.235197] CR2: 0000000000000008
[ 0.235610] random: get_random_bytes called from print_oops_end_marker+0x26/0x40 with crng_init=0
[ 0.237389] ---[ end trace 58f36740c65a5535 ]---
[ 0.238111] RIP: 0010:__free_pages_ok+0x2de/0x5c0
[ 0.238747] Code: 00 00 4b 8d 04 a4 48 c1 e5 04 48 c1 e0 08 48 89 c1 48 01 c5 4b 8d 04 c0 48 c1 e0 03 48 01 c5 48 01 c8 49 8b b4 2e c0 00 00 00 <48> 89 7e 08 48 89 73 08 48 89 53 10 48 89 3a 49 83 84 06 00 01 00
[ 0.241001] RSP: 0000:ffffffff97e03e60 EFLAGS: 00010002
[ 0.241618] RAX: 00000000000007d0 RBX: ffffda4d00050000 RCX: 0000000000000500
[ 0.242461] RDX: ffffa2a9bd3fe900 RSI: 0000000000000000 RDI: ffffda4d00050008
[ 0.243308] RBP: 0000000000000840 R08: 000000000000000a R09: 0000000000000400
[ 0.244161] R10: dead000000000100 R11: 0000000000000039 R12: 0000000000000001
[ 0.245093] R13: ffffa2a9bd3fe500 R14: ffffa2a9bd3fe000 R15: 000000000000000a
[ 0.245995] FS: 0000000000000000(0000) GS:ffffa2a9b7600000(0000) knlGS:0000000000000000
[ 0.246972] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.247863] CR2: 0000000000000008 CR3: 00000001cf80a001 CR4: 00000000000606b0
[ 0.248950] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.249978] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.250976] Kernel panic - not syncing: Attempted to kill the idle task!
[ 0.251915] Rebooting in 10 seconds..

The second one is when reading some files after login, eg. cat
/proc/iomem:

[ 51.732899] BUG: unable to handle page fault for address: 000000007cfa9028
[ 51.736351] #PF: supervisor read access in kernel mode
[ 51.737549] #PF: error_code(0x0000) - not-present page
[ 51.738645] PGD 0 P4D 0
[ 51.738929] Oops: 0000 [#1] SMP PTI
[ 51.739290] CPU: 1 PID: 467 Comm: cat Not tainted 5.5.0-rc3+ #3
[ 51.740040] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 51.741084] RIP: 0010:r_show+0x33/0xc0
[ 51.741591] Code: fd 41 54 55 53 48 8b 47 70 48 89 f3 48 8b 78 20 e8 c2 1f 1d 00 48 89 da 48 81 78 08 00 00 01 00 19 ed 31 c9 83 e5 fc 83 c5 08 <48> 8b 52 28 48 39 c2 74 73 83 c1 01 83 f9 05 75 ef 41 bc 0a 00 00
[ 51.743884] RSP: 0018:ffff9a983145be30 EFLAGS: 00010202
[ 51.744538] RAX: ffffffff94632c00 RBX: 000000007cfa9000 RCX: 0000000000000000
[ 51.745359] RDX: 000000007cfa9000 RSI: 000000007cfa9000 RDI: ffff9a9828138048
[ 51.746363] RBP: 0000000000000008 R08: 0000000000000039 R09: 0000000000000005
[ 51.750059] R10: ffff9a9834cde000 R11: ffff9a9934cdd032 R12: ffff9a9834c95700
[ 51.750917] R13: ffff9a982d25f800 R14: ffff9a982d25f828 R15: ffff9a982d25f840
[ 51.751759] FS: 00007ff33164f580(0000) GS:ffff9a9837680000(0000) knlGS:0000000000000000
[ 51.752678] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 51.753336] CR2: 000000007cfa9028 CR3: 00000001f3eae001 CR4: 0000000000160ee0
[ 51.754172] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 51.755076] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 51.756199] Call Trace:
[ 51.756653] seq_read+0x2f7/0x410
[ 51.757284] proc_reg_read+0x3c/0x60
[ 51.757752] vfs_read+0x9d/0x120
[ 51.758247] ksys_read+0x5f/0xe0
[ 51.758871] do_syscall_64+0x6b/0x260
[ 51.759345] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 51.759976] RIP: 0033:0x7ff331577232
[ 51.760419] Code: c0 e9 c2 fe ff ff 50 48 8d 3d 0a 16 0a 00 e8 05 f1 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[ 51.763124] RSP: 002b:00007ffdaae3e8b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 51.764430] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007ff331577232
[ 51.765690] RDX: 0000000000020000 RSI: 00007ff32447e000 RDI: 0000000000000003
[ 51.766937] RBP: 00007ff32447e000 R08: 00007ff32447d010 R09: 0000000000000000
[ 51.769215] R10: 0000000000000022 R11: 0000000000000246 R12: 00005632bd03d1f0
[ 51.770531] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[ 51.771816] Modules linked in:
[ 51.772637] CR2: 000000007cfa9028
[ 51.773462] ---[ end trace 0eb61054f7dfd62d ]---
[ 52.021257] RIP: 0010:r_show+0x33/0xc0
[ 52.022625] Code: fd 41 54 55 53 48 8b 47 70 48 89 f3 48 8b 78 20 e8 c2 1f 1d 00 48 89 da 48 81 78 08 00 00 01 00 19 ed 31 c9 83 e5 fc 83 c5 08 <48> 8b 52 28 48 39 c2 74 73 83 c1 01 83 f9 05 75 ef 41 bc 0a 00 00
[ 52.025914] RSP: 0018:ffff9a983145be30 EFLAGS: 00010202
[ 52.027027] RAX: ffffffff94632c00 RBX: 000000007cfa9000 RCX: 0000000000000000
[ 52.028429] RDX: 000000007cfa9000 RSI: 000000007cfa9000 RDI: ffff9a9828138048
[ 52.029986] RBP: 0000000000000008 R08: 0000000000000039 R09: 0000000000000005
[ 52.031495] R10: ffff9a9834cde000 R11: ffff9a9934cdd032 R12: ffff9a9834c95700
[ 52.033053] R13: ffff9a982d25f800 R14: ffff9a982d25f828 R15: ffff9a982d25f840
[ 52.035086] FS: 00007ff33164f580(0000) GS:ffff9a9837680000(0000) knlGS:0000000000000000
[ 52.036945] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 52.039737] CR2: 000000007cfa9028 CR3: 00000001f3eae001 CR4: 0000000000160ee0
[ 52.041116] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 52.042322] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Thanks
Dave

2019-12-30 05:56:45

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage


Hi Dan,
[snip]
> I reproduced some other panics with or without my fix about the efi boot
> mem, but not 100% reproducible although high likely. I'm using params
> below:
> efi_fake_mem=200M@5G:0x40000,300M@5600M:0x40000
>
> I suspect efi_fake_mem needs more careful sanity checks about the memory
> user provided, but I'm not very familiar with the details though..
>
> The issues I notices are two different ones:
> First one is a panic during booting:
> [ 0.210239] mem auto-init: stack:off, heap alloc:off, heap free:off
> [ 0.215983] BUG: kernel NULL pointer dereference, address: 0000000000000008
> [ 0.216835] #PF: supervisor write access in kernel mode
> [ 0.217384] #PF: error_code(0x0002) - not-present page
> [ 0.217976] PGD 0 P4D 0
> [ 0.218248] Oops: 0002 [#1] SMP PTI
> [ 0.218668] CPU: 0 PID: 0 Comm: swapper Not tainted 5.5.0-rc3+ #3
> [ 0.219315] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> [ 0.220191] RIP: 0010:__free_pages_ok+0x2de/0x5c0
> [ 0.220690] Code: 00 00 4b 8d 04 a4 48 c1 e5 04 48 c1 e0 08 48 89 c1 48 01 c5 4b 8d 04 c0 48 c1 e0 03 48 01 c5 48 01 c8 49 8b b4 2e c0 00 00 00 <48> 89 7e 08 48 89 73 08 48 89 53 10 48 89 3a 49 83 84 06 00 01 00
> [ 0.222843] RSP: 0000:ffffffff97e03e60 EFLAGS: 00010002
> [ 0.223400] RAX: 00000000000007d0 RBX: ffffda4d00050000 RCX: 0000000000000500
> [ 0.224200] RDX: ffffa2a9bd3fe900 RSI: 0000000000000000 RDI: ffffda4d00050008
> [ 0.224984] RBP: 0000000000000840 R08: 000000000000000a R09: 0000000000000400
> [ 0.225825] R10: dead000000000100 R11: 0000000000000039 R12: 0000000000000001
> [ 0.226793] R13: ffffa2a9bd3fe500 R14: ffffa2a9bd3fe000 R15: 000000000000000a
> [ 0.227764] FS: 0000000000000000(0000) GS:ffffa2a9b7600000(0000) knlGS:0000000000000000
> [ 0.228799] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.229511] CR2: 0000000000000008 CR3: 00000001cf80a001 CR4: 00000000000606b0
> [ 0.230323] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 0.231399] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 0.232219] Call Trace:
> [ 0.232524] memblock_free_all+0x127/0x195
> [ 0.233090] mem_init+0x15/0x9d
> [ 0.233450] start_kernel+0x215/0x4e0
> [ 0.233875] ? load_ucode_bsp+0x3e/0x11b
> [ 0.234325] secondary_startup_64+0xa4/0xb0
> [ 0.234833] Modules linked in:
> [ 0.235197] CR2: 0000000000000008
> [ 0.235610] random: get_random_bytes called from print_oops_end_marker+0x26/0x40 with crng_init=0
> [ 0.237389] ---[ end trace 58f36740c65a5535 ]---
> [ 0.238111] RIP: 0010:__free_pages_ok+0x2de/0x5c0
> [ 0.238747] Code: 00 00 4b 8d 04 a4 48 c1 e5 04 48 c1 e0 08 48 89 c1 48 01 c5 4b 8d 04 c0 48 c1 e0 03 48 01 c5 48 01 c8 49 8b b4 2e c0 00 00 00 <48> 89 7e 08 48 89 73 08 48 89 53 10 48 89 3a 49 83 84 06 00 01 00
> [ 0.241001] RSP: 0000:ffffffff97e03e60 EFLAGS: 00010002
> [ 0.241618] RAX: 00000000000007d0 RBX: ffffda4d00050000 RCX: 0000000000000500
> [ 0.242461] RDX: ffffa2a9bd3fe900 RSI: 0000000000000000 RDI: ffffda4d00050008
> [ 0.243308] RBP: 0000000000000840 R08: 000000000000000a R09: 0000000000000400
> [ 0.244161] R10: dead000000000100 R11: 0000000000000039 R12: 0000000000000001
> [ 0.245093] R13: ffffa2a9bd3fe500 R14: ffffa2a9bd3fe000 R15: 000000000000000a
> [ 0.245995] FS: 0000000000000000(0000) GS:ffffa2a9b7600000(0000) knlGS:0000000000000000
> [ 0.246972] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.247863] CR2: 0000000000000008 CR3: 00000001cf80a001 CR4: 00000000000606b0
> [ 0.248950] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 0.249978] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 0.250976] Kernel panic - not syncing: Attempted to kill the idle task!
> [ 0.251915] Rebooting in 10 seconds..
>
> The second one is when reading some files after login, eg. cat
> /proc/iomem:
>
> [ 51.732899] BUG: unable to handle page fault for address: 000000007cfa9028
> [ 51.736351] #PF: supervisor read access in kernel mode
> [ 51.737549] #PF: error_code(0x0000) - not-present page
> [ 51.738645] PGD 0 P4D 0
> [ 51.738929] Oops: 0000 [#1] SMP PTI
> [ 51.739290] CPU: 1 PID: 467 Comm: cat Not tainted 5.5.0-rc3+ #3
> [ 51.740040] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> [ 51.741084] RIP: 0010:r_show+0x33/0xc0
> [ 51.741591] Code: fd 41 54 55 53 48 8b 47 70 48 89 f3 48 8b 78 20 e8 c2 1f 1d 00 48 89 da 48 81 78 08 00 00 01 00 19 ed 31 c9 83 e5 fc 83 c5 08 <48> 8b 52 28 48 39 c2 74 73 83 c1 01 83 f9 05 75 ef 41 bc 0a 00 00
> [ 51.743884] RSP: 0018:ffff9a983145be30 EFLAGS: 00010202
> [ 51.744538] RAX: ffffffff94632c00 RBX: 000000007cfa9000 RCX: 0000000000000000
> [ 51.745359] RDX: 000000007cfa9000 RSI: 000000007cfa9000 RDI: ffff9a9828138048
> [ 51.746363] RBP: 0000000000000008 R08: 0000000000000039 R09: 0000000000000005
> [ 51.750059] R10: ffff9a9834cde000 R11: ffff9a9934cdd032 R12: ffff9a9834c95700
> [ 51.750917] R13: ffff9a982d25f800 R14: ffff9a982d25f828 R15: ffff9a982d25f840
> [ 51.751759] FS: 00007ff33164f580(0000) GS:ffff9a9837680000(0000) knlGS:0000000000000000
> [ 51.752678] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 51.753336] CR2: 000000007cfa9028 CR3: 00000001f3eae001 CR4: 0000000000160ee0
> [ 51.754172] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 51.755076] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 51.756199] Call Trace:
> [ 51.756653] seq_read+0x2f7/0x410
> [ 51.757284] proc_reg_read+0x3c/0x60
> [ 51.757752] vfs_read+0x9d/0x120
> [ 51.758247] ksys_read+0x5f/0xe0
> [ 51.758871] do_syscall_64+0x6b/0x260
> [ 51.759345] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 51.759976] RIP: 0033:0x7ff331577232
> [ 51.760419] Code: c0 e9 c2 fe ff ff 50 48 8d 3d 0a 16 0a 00 e8 05 f1 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
> [ 51.763124] RSP: 002b:00007ffdaae3e8b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> [ 51.764430] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007ff331577232
> [ 51.765690] RDX: 0000000000020000 RSI: 00007ff32447e000 RDI: 0000000000000003
> [ 51.766937] RBP: 00007ff32447e000 R08: 00007ff32447d010 R09: 0000000000000000
> [ 51.769215] R10: 0000000000000022 R11: 0000000000000246 R12: 00005632bd03d1f0
> [ 51.770531] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
> [ 51.771816] Modules linked in:
> [ 51.772637] CR2: 000000007cfa9028
> [ 51.773462] ---[ end trace 0eb61054f7dfd62d ]---

BTW, just tried revert below patch
x86-efi-Add-efi_fake_mem-support-for-EFI_MEMORY_SP.patch

Then the first issue did not happen any more, but the second one (cat
/proc/iomem issue) still happened.

Thanks
Dave

2019-12-30 09:43:20

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage

On Sat, Dec 28, 2019 at 10:13 PM Dan Williams <[email protected]> wrote:
>
> On Sat, Dec 28, 2019 at 12:54 PM Dan Williams
> <[email protected]> wrote:
> >
> > On Tue, Dec 3, 2019 at 11:53 PM Dave Young <[email protected]> wrote:
> > >
> > > Michael Weiser reported he got below error during a kexec rebooting:
> > > esrt: Unsupported ESRT version 2904149718861218184.
> > >
> > > The ESRT memory stays in EFI boot services data, and it was reserved
> > > in kernel via efi_mem_reserve(). The initial purpose of the reservation
> > > is to reuse the EFI boot services data across kexec reboot. For example
> > > the BGRT image data and some ESRT memory like Michael reported.
> > >
> > > But although the memory is reserved it is not updated in X86 e820 table.
> > > And kexec_file_load iterate system ram in io resource list to find places
> > > for kernel, initramfs and other stuff. In Michael's case the kexec loaded
> > > initramfs overwritten the ESRT memory and then the failure happened.
> > >
> > > Since kexec_file_load depends on the e820 to be updated, just fix this
> > > by updating the reserved EFI boot services memory as reserved type in e820.
> > >
> > > Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are
> > > bypassed in the reservation code path because they are assumed as reserved.
> > > But the reservation is still needed for multiple kexec reboot.
> > > And it is the only possible case we come here thus just drop the code
> > > chunk then everything works without side effects.
> > >
> > > On my machine the ESRT memory sits in an EFI runtime data range, it does
> > > not trigger the problem, but I successfully tested with BGRT instead.
> > > both kexec_load and kexec_file_load work and kdump works as well.
> > >
> > > Signed-off-by: Dave Young <[email protected]>
> > > ---
> > > arch/x86/platform/efi/quirks.c | 6 ++----
> > > 1 file changed, 2 insertions(+), 4 deletions(-)
> > >
> > > --- linux-x86.orig/arch/x86/platform/efi/quirks.c
> > > +++ linux-x86/arch/x86/platform/efi/quirks.c
> > > @@ -260,10 +260,6 @@ void __init efi_arch_mem_reserve(phys_ad
> > > return;
> > > }
> > >
> > > - /* No need to reserve regions that will never be freed. */
> > > - if (md.attribute & EFI_MEMORY_RUNTIME)
> > > - return;
> > > -
> > > size += addr % EFI_PAGE_SIZE;
> > > size = round_up(size, EFI_PAGE_SIZE);
> > > addr = round_down(addr, EFI_PAGE_SIZE);
> > > @@ -293,6 +289,8 @@ void __init efi_arch_mem_reserve(phys_ad
> > > early_memunmap(new, new_size);
> > >
> > > efi_memmap_install(new_phys, num_entries);
> > > + e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
> > > + e820__update_table(e820_table);
> > > }
> > >
> > > /*
> > >
> >
> > Bisect says this change (commit af1648984828) is triggering a
> > regression, likely not urgent, in my testing of the new efi_fake_mem=
> > facility to allow memory to be marked "soft reserved" via the kernel
> > command line (commit 199c84717612 x86/efi: Add efi_fake_mem support
> > for EFI_MEMORY_SP). The following command line triggers the crash
> > signature below:
> >
> > efi_fake_mem=4G@9G:0x40000,4G@13G:0x40000
> >
> > However, this command line works ok:
> >
> > efi_fake_mem=8G@9G:0x40000
> >
> > So, something about multiple efi_fake_mem statements interacts badly
> > with this change. Nothing obvious occurs to me at the moment, I'll
> > keep debugging, but wanted to highlight this in the meantime in case
> > someone else sees a deeper issue or the root cause.
>
> Still looking, but this failure does not seem to be specific to the
> "soft reservation" changes. Any update to the efi memmap that pushes
> it over a page boundary triggers this failure. I.e. I can fix the
> problem by over-allocating the efi memmap and then page aligning the
> result. __early_ioremap "should" be handling this case, but it appears
> something else is messing this up.

Found it. Neither this patch nor the soft reservation changes are at
fault, they are just helping to trigger a long standing bug in
efi_fake_memmap(). Its usage of efi_memmap_split_count() can over
count the number of splits needed for new entries. Consider the case
of 2 contiguous fake entries intersecting the end of a single entry.
The first call to efi_memmap_split_count() determines the resulting
split will be (old1, new1, old2), the second call determines (old1,
new2). The result is 2 splits when only 1 is needed to get a result of
(old1, new1, new2) and the new map ends up with an empty entry.
efi_memmap_install() interprets an empty entry as start = 0 end =
0xffffffffffffffff and attempts an extra split / copy past the end of
the new map.

I'll send a patch to fix up efi_fake_memmap().

2019-12-30 10:51:17

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage

On 12/30/19 at 01:42am, Dan Williams wrote:
> On Sat, Dec 28, 2019 at 10:13 PM Dan Williams <[email protected]> wrote:
> >
> > On Sat, Dec 28, 2019 at 12:54 PM Dan Williams
> > <[email protected]> wrote:
> > >
> > > On Tue, Dec 3, 2019 at 11:53 PM Dave Young <[email protected]> wrote:
> > > >
> > > > Michael Weiser reported he got below error during a kexec rebooting:
> > > > esrt: Unsupported ESRT version 2904149718861218184.
> > > >
> > > > The ESRT memory stays in EFI boot services data, and it was reserved
> > > > in kernel via efi_mem_reserve(). The initial purpose of the reservation
> > > > is to reuse the EFI boot services data across kexec reboot. For example
> > > > the BGRT image data and some ESRT memory like Michael reported.
> > > >
> > > > But although the memory is reserved it is not updated in X86 e820 table.
> > > > And kexec_file_load iterate system ram in io resource list to find places
> > > > for kernel, initramfs and other stuff. In Michael's case the kexec loaded
> > > > initramfs overwritten the ESRT memory and then the failure happened.
> > > >
> > > > Since kexec_file_load depends on the e820 to be updated, just fix this
> > > > by updating the reserved EFI boot services memory as reserved type in e820.
> > > >
> > > > Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are
> > > > bypassed in the reservation code path because they are assumed as reserved.
> > > > But the reservation is still needed for multiple kexec reboot.
> > > > And it is the only possible case we come here thus just drop the code
> > > > chunk then everything works without side effects.
> > > >
> > > > On my machine the ESRT memory sits in an EFI runtime data range, it does
> > > > not trigger the problem, but I successfully tested with BGRT instead.
> > > > both kexec_load and kexec_file_load work and kdump works as well.
> > > >
> > > > Signed-off-by: Dave Young <[email protected]>
> > > > ---
> > > > arch/x86/platform/efi/quirks.c | 6 ++----
> > > > 1 file changed, 2 insertions(+), 4 deletions(-)
> > > >
> > > > --- linux-x86.orig/arch/x86/platform/efi/quirks.c
> > > > +++ linux-x86/arch/x86/platform/efi/quirks.c
> > > > @@ -260,10 +260,6 @@ void __init efi_arch_mem_reserve(phys_ad
> > > > return;
> > > > }
> > > >
> > > > - /* No need to reserve regions that will never be freed. */
> > > > - if (md.attribute & EFI_MEMORY_RUNTIME)
> > > > - return;
> > > > -
> > > > size += addr % EFI_PAGE_SIZE;
> > > > size = round_up(size, EFI_PAGE_SIZE);
> > > > addr = round_down(addr, EFI_PAGE_SIZE);
> > > > @@ -293,6 +289,8 @@ void __init efi_arch_mem_reserve(phys_ad
> > > > early_memunmap(new, new_size);
> > > >
> > > > efi_memmap_install(new_phys, num_entries);
> > > > + e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
> > > > + e820__update_table(e820_table);
> > > > }
> > > >
> > > > /*
> > > >
> > >
> > > Bisect says this change (commit af1648984828) is triggering a
> > > regression, likely not urgent, in my testing of the new efi_fake_mem=
> > > facility to allow memory to be marked "soft reserved" via the kernel
> > > command line (commit 199c84717612 x86/efi: Add efi_fake_mem support
> > > for EFI_MEMORY_SP). The following command line triggers the crash
> > > signature below:
> > >
> > > efi_fake_mem=4G@9G:0x40000,4G@13G:0x40000
> > >
> > > However, this command line works ok:
> > >
> > > efi_fake_mem=8G@9G:0x40000
> > >
> > > So, something about multiple efi_fake_mem statements interacts badly
> > > with this change. Nothing obvious occurs to me at the moment, I'll
> > > keep debugging, but wanted to highlight this in the meantime in case
> > > someone else sees a deeper issue or the root cause.
> >
> > Still looking, but this failure does not seem to be specific to the
> > "soft reservation" changes. Any update to the efi memmap that pushes
> > it over a page boundary triggers this failure. I.e. I can fix the
> > problem by over-allocating the efi memmap and then page aligning the
> > result. __early_ioremap "should" be handling this case, but it appears
> > something else is messing this up.
>
> Found it. Neither this patch nor the soft reservation changes are at
> fault, they are just helping to trigger a long standing bug in
> efi_fake_memmap(). Its usage of efi_memmap_split_count() can over
> count the number of splits needed for new entries. Consider the case
> of 2 contiguous fake entries intersecting the end of a single entry.
> The first call to efi_memmap_split_count() determines the resulting
> split will be (old1, new1, old2), the second call determines (old1,
> new2). The result is 2 splits when only 1 is needed to get a result of
> (old1, new1, new2) and the new map ends up with an empty entry.
> efi_memmap_install() interprets an empty entry as start = 0 end =
> 0xffffffffffffffff and attempts an extra split / copy past the end of
> the new map.
>
> I'll send a patch to fix up efi_fake_memmap().
>

Cool, I also noticed if two of fake mem used, we only get one md with
"SP" attribute in print_efi_memmap, that is the root cause.

Thanks
Dave

2019-12-30 20:17:52

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage

On Mon, Dec 30, 2019 at 1:42 AM Dan Williams <[email protected]> wrote:
[..]
> I'll send a patch to fix up efi_fake_memmap().

For others following this thread, that patch is here:

http://lore.kernel.org/r/157773590338.4153451.5898675419563883883.stgit@dwillia2-desk3.amr.corp.intel.com