2022-10-28 13:35:21

by ns

[permalink] [raw]
Subject: Bug: kexec on Lenovo ThinkPad T480 disables EFI mode

Greetings,

I've been hitting a bug on my Lenovo ThinkPad T480 where kexecing will
cause EFI mode (if that's the right term for it) to be unconditionally
disabled, even when not using the --noefi option to kexec.

What I mean by "EFI mode" being disabled, more than just EFI runtime
services, is that basically nothing about the system's EFI is visible
post-kexec. Normally you have a message like this in dmesg when the
system is booted in EFI mode:

[ 0.000000] efi: EFI v2.70 by EDK II
[ 0.000000] efi: SMBIOS=0x7f98a000 ACPI=0x7fb7e000 ACPI
2.0=0x7fb7e014 MEMATTR=0x7ec63018
(obviously not the real firmware of the machine I'm talking about, but I
can also send that if it would be of any help)

No such message pops up in my dmesg as a result of this bug, & this
causes some fallout like being unable to find the system's DMI
information:

<6>[ 0.000000] DMI not present or invalid.

The efivarfs module also fails to load with -ENODEV.

I've tried also booting with efi=runtime explicitly but it doesn't
change anything. The kernel still does not print the name of the EFI
firmware, DMI is still missing, & efivarfs still fails to load.

I've been using the kexec_load syscall for all these tests, if it's
important.

Also, to make it very clear, all this only ever happens post-kexec. When
booting straight from UEFI (with the EFI stub), all the aforementioned
stuff that fails works perfectly fine (i.e. name of firmware is printed,
DMI is properly found, & efivarfs loads & mounts just fine).

This is reproducible with a vanilla 6.1-rc2 kernel. I've been trying to
bisect it, but it seems like it goes pretty far back. I've got vanilla
mainline kernel builds dating back to 5.17 that have the exact same
issue. It might be worth noting that during this testing, I made sure
the version of the kernel being kexeced & the kernel kexecing were the
same version. It may not have been a problem in older kernels, but that
would be difficult to test for me (a pretty important driver for this
machine was only merged during v5.17-rc4). So it may not have been a
regression & just a hidden problem since time immemorial.

I am willing to test any patches I may get to further debug or fix
this issue, preferably based on the current state of torvalds/linux.git.
I can build & test kernels quite a few times per day.

I can also send any important materials (kernel config, dmesg, firmware
information, so on & so forth) on request. I'll also just mention I'm
using kexec-tools 2.0.24 upfront, if it matters.

Regards,


2022-11-05 03:47:59

by Baoquan He

[permalink] [raw]
Subject: Re: Bug: kexec on Lenovo ThinkPad T480 disables EFI mode

Add Dave to CC

On 10/28/22 at 01:02pm, [email protected] wrote:
> Greetings,
>
> I've been hitting a bug on my Lenovo ThinkPad T480 where kexecing will
> cause EFI mode (if that's the right term for it) to be unconditionally
> disabled, even when not using the --noefi option to kexec.
>
> What I mean by "EFI mode" being disabled, more than just EFI runtime
> services, is that basically nothing about the system's EFI is visible
> post-kexec. Normally you have a message like this in dmesg when the
> system is booted in EFI mode:
>
> [ 0.000000] efi: EFI v2.70 by EDK II
> [ 0.000000] efi: SMBIOS=0x7f98a000 ACPI=0x7fb7e000 ACPI 2.0=0x7fb7e014
> MEMATTR=0x7ec63018
> (obviously not the real firmware of the machine I'm talking about, but I
> can also send that if it would be of any help)
>
> No such message pops up in my dmesg as a result of this bug, & this
> causes some fallout like being unable to find the system's DMI
> information:
>
> <6>[ 0.000000] DMI not present or invalid.
>
> The efivarfs module also fails to load with -ENODEV.
>
> I've tried also booting with efi=runtime explicitly but it doesn't
> change anything. The kernel still does not print the name of the EFI
> firmware, DMI is still missing, & efivarfs still fails to load.
>
> I've been using the kexec_load syscall for all these tests, if it's
> important.
>
> Also, to make it very clear, all this only ever happens post-kexec. When
> booting straight from UEFI (with the EFI stub), all the aforementioned
> stuff that fails works perfectly fine (i.e. name of firmware is printed,
> DMI is properly found, & efivarfs loads & mounts just fine).
>
> This is reproducible with a vanilla 6.1-rc2 kernel. I've been trying to
> bisect it, but it seems like it goes pretty far back. I've got vanilla
> mainline kernel builds dating back to 5.17 that have the exact same
> issue. It might be worth noting that during this testing, I made sure
> the version of the kernel being kexeced & the kernel kexecing were the
> same version. It may not have been a problem in older kernels, but that
> would be difficult to test for me (a pretty important driver for this
> machine was only merged during v5.17-rc4). So it may not have been a
> regression & just a hidden problem since time immemorial.
>
> I am willing to test any patches I may get to further debug or fix
> this issue, preferably based on the current state of torvalds/linux.git.
> I can build & test kernels quite a few times per day.
>
> I can also send any important materials (kernel config, dmesg, firmware
> information, so on & so forth) on request. I'll also just mention I'm
> using kexec-tools 2.0.24 upfront, if it matters.
>
> Regards,
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec
>


2022-11-05 06:27:35

by Dave Young

[permalink] [raw]
Subject: Re: Bug: kexec on Lenovo ThinkPad T480 disables EFI mode

Baoquan, thanks for cc me.

On Sat, 5 Nov 2022 at 11:10, Baoquan He <[email protected]> wrote:
>
> Add Dave to CC
>
> On 10/28/22 at 01:02pm, [email protected] wrote:
> > Greetings,
> >
> > I've been hitting a bug on my Lenovo ThinkPad T480 where kexecing will
> > cause EFI mode (if that's the right term for it) to be unconditionally
> > disabled, even when not using the --noefi option to kexec.
> >
> > What I mean by "EFI mode" being disabled, more than just EFI runtime
> > services, is that basically nothing about the system's EFI is visible
> > post-kexec. Normally you have a message like this in dmesg when the
> > system is booted in EFI mode:
> >
> > [ 0.000000] efi: EFI v2.70 by EDK II
> > [ 0.000000] efi: SMBIOS=0x7f98a000 ACPI=0x7fb7e000 ACPI 2.0=0x7fb7e014
> > MEMATTR=0x7ec63018
> > (obviously not the real firmware of the machine I'm talking about, but I
> > can also send that if it would be of any help)
> >
> > No such message pops up in my dmesg as a result of this bug, & this
> > causes some fallout like being unable to find the system's DMI
> > information:
> >
> > <6>[ 0.000000] DMI not present or invalid.
> >
> > The efivarfs module also fails to load with -ENODEV.
> >
> > I've tried also booting with efi=runtime explicitly but it doesn't
> > change anything. The kernel still does not print the name of the EFI
> > firmware, DMI is still missing, & efivarfs still fails to load.
> >
> > I've been using the kexec_load syscall for all these tests, if it's
> > important.
> >
> > Also, to make it very clear, all this only ever happens post-kexec. When
> > booting straight from UEFI (with the EFI stub), all the aforementioned
> > stuff that fails works perfectly fine (i.e. name of firmware is printed,
> > DMI is properly found, & efivarfs loads & mounts just fine).
> >
> > This is reproducible with a vanilla 6.1-rc2 kernel. I've been trying to
> > bisect it, but it seems like it goes pretty far back. I've got vanilla
> > mainline kernel builds dating back to 5.17 that have the exact same
> > issue. It might be worth noting that during this testing, I made sure
> > the version of the kernel being kexeced & the kernel kexecing were the
> > same version. It may not have been a problem in older kernels, but that
> > would be difficult to test for me (a pretty important driver for this
> > machine was only merged during v5.17-rc4). So it may not have been a
> > regression & just a hidden problem since time immemorial.
> >
> > I am willing to test any patches I may get to further debug or fix
> > this issue, preferably based on the current state of torvalds/linux.git.
> > I can build & test kernels quite a few times per day.
> >
> > I can also send any important materials (kernel config, dmesg, firmware
> > information, so on & so forth) on request. I'll also just mention I'm
> > using kexec-tools 2.0.24 upfront, if it matters.

Can you check the efi runtime in sysfs:
ls /sys/firmware/efi/runtime-map/

If nothing then maybe you did not enable CONFIG_EFI_RUNTIME_MAP=y, it
is needed for kexec UEFI boot on x86_64.

Otherwise you can add debug printf in kexec-tools efi error path to
see what is wrong.
kexec/arch/i386/x86-linux-setup.c : function setup_efi_data

And if it still not work please post your kernel config, I can have a
try although I do not have the t480 now.


> >
> > Regards,
> >
> > _______________________________________________
> > kexec mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/kexec
> >
>


2022-11-05 14:20:23

by ns

[permalink] [raw]
Subject: Re: Bug: kexec on Lenovo ThinkPad T480 disables EFI mode

On 2022-11-05 05:49, Dave Young wrote:
> Baoquan, thanks for cc me.
>
> On Sat, 5 Nov 2022 at 11:10, Baoquan He <[email protected]> wrote:
>>
>> Add Dave to CC
>>
>> On 10/28/22 at 01:02pm, [email protected] wrote:
>> > Greetings,
>> >
>> > I've been hitting a bug on my Lenovo ThinkPad T480 where kexecing will
>> > cause EFI mode (if that's the right term for it) to be unconditionally
>> > disabled, even when not using the --noefi option to kexec.
>> >
>> > What I mean by "EFI mode" being disabled, more than just EFI runtime
>> > services, is that basically nothing about the system's EFI is visible
>> > post-kexec. Normally you have a message like this in dmesg when the
>> > system is booted in EFI mode:
>> >
>> > [ 0.000000] efi: EFI v2.70 by EDK II
>> > [ 0.000000] efi: SMBIOS=0x7f98a000 ACPI=0x7fb7e000 ACPI 2.0=0x7fb7e014
>> > MEMATTR=0x7ec63018
>> > (obviously not the real firmware of the machine I'm talking about, but I
>> > can also send that if it would be of any help)
>> >
>> > No such message pops up in my dmesg as a result of this bug, & this
>> > causes some fallout like being unable to find the system's DMI
>> > information:
>> >
>> > <6>[ 0.000000] DMI not present or invalid.
>> >
>> > The efivarfs module also fails to load with -ENODEV.
>> >
>> > I've tried also booting with efi=runtime explicitly but it doesn't
>> > change anything. The kernel still does not print the name of the EFI
>> > firmware, DMI is still missing, & efivarfs still fails to load.
>> >
>> > I've been using the kexec_load syscall for all these tests, if it's
>> > important.
>> >
>> > Also, to make it very clear, all this only ever happens post-kexec. When
>> > booting straight from UEFI (with the EFI stub), all the aforementioned
>> > stuff that fails works perfectly fine (i.e. name of firmware is printed,
>> > DMI is properly found, & efivarfs loads & mounts just fine).
>> >
>> > This is reproducible with a vanilla 6.1-rc2 kernel. I've been trying to
>> > bisect it, but it seems like it goes pretty far back. I've got vanilla
>> > mainline kernel builds dating back to 5.17 that have the exact same
>> > issue. It might be worth noting that during this testing, I made sure
>> > the version of the kernel being kexeced & the kernel kexecing were the
>> > same version. It may not have been a problem in older kernels, but that
>> > would be difficult to test for me (a pretty important driver for this
>> > machine was only merged during v5.17-rc4). So it may not have been a
>> > regression & just a hidden problem since time immemorial.
>> >
>> > I am willing to test any patches I may get to further debug or fix
>> > this issue, preferably based on the current state of torvalds/linux.git.
>> > I can build & test kernels quite a few times per day.
>> >
>> > I can also send any important materials (kernel config, dmesg, firmware
>> > information, so on & so forth) on request. I'll also just mention I'm
>> > using kexec-tools 2.0.24 upfront, if it matters.
>
> Can you check the efi runtime in sysfs:
> ls /sys/firmware/efi/runtime-map/
>
> If nothing then maybe you did not enable CONFIG_EFI_RUNTIME_MAP=y, it
> is needed for kexec UEFI boot on x86_64.

Oh my, it really is that simple.

Indeed, enabling this in the pre-kexec kernel fixes it all up. I had
blindly disabled it in my quest to downsize the pre-kexec kernel to
reduce boot time (it only runs a bootloader). In hindsight, the firmware
drivers section is not really a good section to tweak on a whim.

I'm terribly sorry to have taken your time to "fix" this "bug". But I
must ask, is there any reason why this is a visible config option, or at
least not gated behind CONFIG_EXPERT? drivers/firmware/efi/runtime-map.c
is pretty tiny, & considering it depends on CONFIG_KEXEC_CORE, one
probably wants to have kexec work properly if they can even enable it.
I admit the help text for it is arguably pretty good, but I feel like
the config option is only really useful for embedded, the same
enviroments where people would disable stuff like CONFIG_DMI -- a config
option that I would argue is pretty justifiably gated behind
CONFIG_EXPERT, because far too many systems break without it & it's
pretty small code, so really not worth it unless you absolutely know
what you're doing. Similarly, I don't really think there's much value
in disabling the ability to kexec without the firmware except if you're
heavily informed & must have the size reduction, especially since in
EFI land that's where your DMI info comes from, if I were to argue for
it on the basis of CONFIG_DMI being gated. In summary, it can cause
quite a bit of unnecessary confusion despite only being useful to a very
small minority of users.

Thank you!

>
> Otherwise you can add debug printf in kexec-tools efi error path to
> see what is wrong.
> kexec/arch/i386/x86-linux-setup.c : function setup_efi_data
>
> And if it still not work please post your kernel config, I can have a
> try although I do not have the t480 now.
>
>
>> >
>> > Regards,
>> >
>> > _______________________________________________
>> > kexec mailing list
>> > [email protected]
>> > http://lists.infradead.org/mailman/listinfo/kexec
>> >
>>

2022-11-07 07:15:49

by Dave Young

[permalink] [raw]
Subject: Re: Bug: kexec on Lenovo ThinkPad T480 disables EFI mode

Hi,

On Sat, 5 Nov 2022 at 22:16, <[email protected]> wrote:
>
> On 2022-11-05 05:49, Dave Young wrote:
> > Baoquan, thanks for cc me.
> >
> > On Sat, 5 Nov 2022 at 11:10, Baoquan He <[email protected]> wrote:
> >>
> >> Add Dave to CC
> >>
> >> On 10/28/22 at 01:02pm, [email protected] wrote:
> >> > Greetings,
> >> >
> >> > I've been hitting a bug on my Lenovo ThinkPad T480 where kexecing will
> >> > cause EFI mode (if that's the right term for it) to be unconditionally
> >> > disabled, even when not using the --noefi option to kexec.
> >> >
> >> > What I mean by "EFI mode" being disabled, more than just EFI runtime
> >> > services, is that basically nothing about the system's EFI is visible
> >> > post-kexec. Normally you have a message like this in dmesg when the
> >> > system is booted in EFI mode:
> >> >
> >> > [ 0.000000] efi: EFI v2.70 by EDK II
> >> > [ 0.000000] efi: SMBIOS=0x7f98a000 ACPI=0x7fb7e000 ACPI 2.0=0x7fb7e014
> >> > MEMATTR=0x7ec63018
> >> > (obviously not the real firmware of the machine I'm talking about, but I
> >> > can also send that if it would be of any help)
> >> >
> >> > No such message pops up in my dmesg as a result of this bug, & this
> >> > causes some fallout like being unable to find the system's DMI
> >> > information:
> >> >
> >> > <6>[ 0.000000] DMI not present or invalid.
> >> >
> >> > The efivarfs module also fails to load with -ENODEV.
> >> >
> >> > I've tried also booting with efi=runtime explicitly but it doesn't
> >> > change anything. The kernel still does not print the name of the EFI
> >> > firmware, DMI is still missing, & efivarfs still fails to load.
> >> >
> >> > I've been using the kexec_load syscall for all these tests, if it's
> >> > important.
> >> >
> >> > Also, to make it very clear, all this only ever happens post-kexec. When
> >> > booting straight from UEFI (with the EFI stub), all the aforementioned
> >> > stuff that fails works perfectly fine (i.e. name of firmware is printed,
> >> > DMI is properly found, & efivarfs loads & mounts just fine).
> >> >
> >> > This is reproducible with a vanilla 6.1-rc2 kernel. I've been trying to
> >> > bisect it, but it seems like it goes pretty far back. I've got vanilla
> >> > mainline kernel builds dating back to 5.17 that have the exact same
> >> > issue. It might be worth noting that during this testing, I made sure
> >> > the version of the kernel being kexeced & the kernel kexecing were the
> >> > same version. It may not have been a problem in older kernels, but that
> >> > would be difficult to test for me (a pretty important driver for this
> >> > machine was only merged during v5.17-rc4). So it may not have been a
> >> > regression & just a hidden problem since time immemorial.
> >> >
> >> > I am willing to test any patches I may get to further debug or fix
> >> > this issue, preferably based on the current state of torvalds/linux.git.
> >> > I can build & test kernels quite a few times per day.
> >> >
> >> > I can also send any important materials (kernel config, dmesg, firmware
> >> > information, so on & so forth) on request. I'll also just mention I'm
> >> > using kexec-tools 2.0.24 upfront, if it matters.
> >
> > Can you check the efi runtime in sysfs:
> > ls /sys/firmware/efi/runtime-map/
> >
> > If nothing then maybe you did not enable CONFIG_EFI_RUNTIME_MAP=y, it
> > is needed for kexec UEFI boot on x86_64.
>
> Oh my, it really is that simple.
>
> Indeed, enabling this in the pre-kexec kernel fixes it all up. I had
> blindly disabled it in my quest to downsize the pre-kexec kernel to
> reduce boot time (it only runs a bootloader). In hindsight, the firmware
> drivers section is not really a good section to tweak on a whim.
>
> I'm terribly sorry to have taken your time to "fix" this "bug". But I
> must ask, is there any reason why this is a visible config option, or at
> least not gated behind CONFIG_EXPERT? drivers/firmware/efi/runtime-map.c
> is pretty tiny, & considering it depends on CONFIG_KEXEC_CORE, one
> probably wants to have kexec work properly if they can even enable it.

Glad to know it works with the .config tweaking. I can not recall any
reason for that though.

Since it sits in the efi code path, let's see how Ard thinks about
your proposal.

Thanks
Dave


2022-11-07 08:11:46

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: Bug: kexec on Lenovo ThinkPad T480 disables EFI mode

On Mon, 7 Nov 2022 at 07:55, Dave Young <[email protected]> wrote:
>
> Hi,
>
> On Sat, 5 Nov 2022 at 22:16, <[email protected]> wrote:
> >
> > On 2022-11-05 05:49, Dave Young wrote:
> > > Baoquan, thanks for cc me.
> > >
> > > On Sat, 5 Nov 2022 at 11:10, Baoquan He <[email protected]> wrote:
> > >>
> > >> Add Dave to CC
> > >>
> > >> On 10/28/22 at 01:02pm, [email protected] wrote:
> > >> > Greetings,
> > >> >
> > >> > I've been hitting a bug on my Lenovo ThinkPad T480 where kexecing will
> > >> > cause EFI mode (if that's the right term for it) to be unconditionally
> > >> > disabled, even when not using the --noefi option to kexec.
> > >> >
> > >> > What I mean by "EFI mode" being disabled, more than just EFI runtime
> > >> > services, is that basically nothing about the system's EFI is visible
> > >> > post-kexec. Normally you have a message like this in dmesg when the
> > >> > system is booted in EFI mode:
> > >> >
> > >> > [ 0.000000] efi: EFI v2.70 by EDK II
> > >> > [ 0.000000] efi: SMBIOS=0x7f98a000 ACPI=0x7fb7e000 ACPI 2.0=0x7fb7e014
> > >> > MEMATTR=0x7ec63018
> > >> > (obviously not the real firmware of the machine I'm talking about, but I
> > >> > can also send that if it would be of any help)
> > >> >
> > >> > No such message pops up in my dmesg as a result of this bug, & this
> > >> > causes some fallout like being unable to find the system's DMI
> > >> > information:
> > >> >
> > >> > <6>[ 0.000000] DMI not present or invalid.
> > >> >
> > >> > The efivarfs module also fails to load with -ENODEV.
> > >> >
> > >> > I've tried also booting with efi=runtime explicitly but it doesn't
> > >> > change anything. The kernel still does not print the name of the EFI
> > >> > firmware, DMI is still missing, & efivarfs still fails to load.
> > >> >
> > >> > I've been using the kexec_load syscall for all these tests, if it's
> > >> > important.
> > >> >
> > >> > Also, to make it very clear, all this only ever happens post-kexec. When
> > >> > booting straight from UEFI (with the EFI stub), all the aforementioned
> > >> > stuff that fails works perfectly fine (i.e. name of firmware is printed,
> > >> > DMI is properly found, & efivarfs loads & mounts just fine).
> > >> >
> > >> > This is reproducible with a vanilla 6.1-rc2 kernel. I've been trying to
> > >> > bisect it, but it seems like it goes pretty far back. I've got vanilla
> > >> > mainline kernel builds dating back to 5.17 that have the exact same
> > >> > issue. It might be worth noting that during this testing, I made sure
> > >> > the version of the kernel being kexeced & the kernel kexecing were the
> > >> > same version. It may not have been a problem in older kernels, but that
> > >> > would be difficult to test for me (a pretty important driver for this
> > >> > machine was only merged during v5.17-rc4). So it may not have been a
> > >> > regression & just a hidden problem since time immemorial.
> > >> >
> > >> > I am willing to test any patches I may get to further debug or fix
> > >> > this issue, preferably based on the current state of torvalds/linux.git.
> > >> > I can build & test kernels quite a few times per day.
> > >> >
> > >> > I can also send any important materials (kernel config, dmesg, firmware
> > >> > information, so on & so forth) on request. I'll also just mention I'm
> > >> > using kexec-tools 2.0.24 upfront, if it matters.
> > >
> > > Can you check the efi runtime in sysfs:
> > > ls /sys/firmware/efi/runtime-map/
> > >
> > > If nothing then maybe you did not enable CONFIG_EFI_RUNTIME_MAP=y, it
> > > is needed for kexec UEFI boot on x86_64.
> >
> > Oh my, it really is that simple.
> >
> > Indeed, enabling this in the pre-kexec kernel fixes it all up. I had
> > blindly disabled it in my quest to downsize the pre-kexec kernel to
> > reduce boot time (it only runs a bootloader). In hindsight, the firmware
> > drivers section is not really a good section to tweak on a whim.
> >
> > I'm terribly sorry to have taken your time to "fix" this "bug". But I
> > must ask, is there any reason why this is a visible config option, or at
> > least not gated behind CONFIG_EXPERT? drivers/firmware/efi/runtime-map.c
> > is pretty tiny, & considering it depends on CONFIG_KEXEC_CORE, one
> > probably wants to have kexec work properly if they can even enable it.
>
> Glad to know it works with the .config tweaking. I can not recall any
> reason for that though.
>
> Since it sits in the efi code path, let's see how Ard thinks about
> your proposal.
>

I don't understand why EFI_RUNTIME_MAP should depend on KEXEC_CORE at
all: it is documented as a feature that can be enabled for debugging
as well, and kexec does not work as expected without it.

Should we just change it like this perhaps?

--- a/drivers/firmware/efi/Kconfig
+++ b/drivers/firmware/efi/Kconfig
@@ -28,8 +28,8 @@ config EFI_VARS_PSTORE_DEFAULT_DISABLE

config EFI_RUNTIME_MAP
bool "Export efi runtime maps to sysfs"
- depends on X86 && EFI && KEXEC_CORE
- default y
+ depends on X86 && EFI
+ default KEXEC_CORE
help

and maybe add an 'if EXPERT' so that the option is only visible to
modify when CONFIG_EXPERT=y ?

In any case, I intend to move this code into arch/x86 as well, so I'll
have a couple of patches out shortly.

2022-11-07 08:11:47

by Dave Young

[permalink] [raw]
Subject: Re: Bug: kexec on Lenovo ThinkPad T480 disables EFI mode

Hi Ard,

On Mon, 7 Nov 2022 at 15:30, Ard Biesheuvel <[email protected]> wrote:
>
> On Mon, 7 Nov 2022 at 07:55, Dave Young <[email protected]> wrote:
> >
> > Hi,
> >
> > On Sat, 5 Nov 2022 at 22:16, <[email protected]> wrote:
> > >
> > > On 2022-11-05 05:49, Dave Young wrote:
> > > > Baoquan, thanks for cc me.
> > > >
> > > > On Sat, 5 Nov 2022 at 11:10, Baoquan He <[email protected]> wrote:
> > > >>
> > > >> Add Dave to CC
> > > >>
> > > >> On 10/28/22 at 01:02pm, [email protected] wrote:
> > > >> > Greetings,
> > > >> >
> > > >> > I've been hitting a bug on my Lenovo ThinkPad T480 where kexecing will
> > > >> > cause EFI mode (if that's the right term for it) to be unconditionally
> > > >> > disabled, even when not using the --noefi option to kexec.
> > > >> >
> > > >> > What I mean by "EFI mode" being disabled, more than just EFI runtime
> > > >> > services, is that basically nothing about the system's EFI is visible
> > > >> > post-kexec. Normally you have a message like this in dmesg when the
> > > >> > system is booted in EFI mode:
> > > >> >
> > > >> > [ 0.000000] efi: EFI v2.70 by EDK II
> > > >> > [ 0.000000] efi: SMBIOS=0x7f98a000 ACPI=0x7fb7e000 ACPI 2.0=0x7fb7e014
> > > >> > MEMATTR=0x7ec63018
> > > >> > (obviously not the real firmware of the machine I'm talking about, but I
> > > >> > can also send that if it would be of any help)
> > > >> >
> > > >> > No such message pops up in my dmesg as a result of this bug, & this
> > > >> > causes some fallout like being unable to find the system's DMI
> > > >> > information:
> > > >> >
> > > >> > <6>[ 0.000000] DMI not present or invalid.
> > > >> >
> > > >> > The efivarfs module also fails to load with -ENODEV.
> > > >> >
> > > >> > I've tried also booting with efi=runtime explicitly but it doesn't
> > > >> > change anything. The kernel still does not print the name of the EFI
> > > >> > firmware, DMI is still missing, & efivarfs still fails to load.
> > > >> >
> > > >> > I've been using the kexec_load syscall for all these tests, if it's
> > > >> > important.
> > > >> >
> > > >> > Also, to make it very clear, all this only ever happens post-kexec. When
> > > >> > booting straight from UEFI (with the EFI stub), all the aforementioned
> > > >> > stuff that fails works perfectly fine (i.e. name of firmware is printed,
> > > >> > DMI is properly found, & efivarfs loads & mounts just fine).
> > > >> >
> > > >> > This is reproducible with a vanilla 6.1-rc2 kernel. I've been trying to
> > > >> > bisect it, but it seems like it goes pretty far back. I've got vanilla
> > > >> > mainline kernel builds dating back to 5.17 that have the exact same
> > > >> > issue. It might be worth noting that during this testing, I made sure
> > > >> > the version of the kernel being kexeced & the kernel kexecing were the
> > > >> > same version. It may not have been a problem in older kernels, but that
> > > >> > would be difficult to test for me (a pretty important driver for this
> > > >> > machine was only merged during v5.17-rc4). So it may not have been a
> > > >> > regression & just a hidden problem since time immemorial.
> > > >> >
> > > >> > I am willing to test any patches I may get to further debug or fix
> > > >> > this issue, preferably based on the current state of torvalds/linux.git.
> > > >> > I can build & test kernels quite a few times per day.
> > > >> >
> > > >> > I can also send any important materials (kernel config, dmesg, firmware
> > > >> > information, so on & so forth) on request. I'll also just mention I'm
> > > >> > using kexec-tools 2.0.24 upfront, if it matters.
> > > >
> > > > Can you check the efi runtime in sysfs:
> > > > ls /sys/firmware/efi/runtime-map/
> > > >
> > > > If nothing then maybe you did not enable CONFIG_EFI_RUNTIME_MAP=y, it
> > > > is needed for kexec UEFI boot on x86_64.
> > >
> > > Oh my, it really is that simple.
> > >
> > > Indeed, enabling this in the pre-kexec kernel fixes it all up. I had
> > > blindly disabled it in my quest to downsize the pre-kexec kernel to
> > > reduce boot time (it only runs a bootloader). In hindsight, the firmware
> > > drivers section is not really a good section to tweak on a whim.
> > >
> > > I'm terribly sorry to have taken your time to "fix" this "bug". But I
> > > must ask, is there any reason why this is a visible config option, or at
> > > least not gated behind CONFIG_EXPERT? drivers/firmware/efi/runtime-map.c
> > > is pretty tiny, & considering it depends on CONFIG_KEXEC_CORE, one
> > > probably wants to have kexec work properly if they can even enable it.
> >
> > Glad to know it works with the .config tweaking. I can not recall any
> > reason for that though.
> >
> > Since it sits in the efi code path, let's see how Ard thinks about
> > your proposal.
> >
>
> I don't understand why EFI_RUNTIME_MAP should depend on KEXEC_CORE at
> all: it is documented as a feature that can be enabled for debugging
> as well, and kexec does not work as expected without it.

Probably debugging only mentioned in text, but not been considered in
the kconfig logic :(

>
> Should we just change it like this perhaps?
>
> --- a/drivers/firmware/efi/Kconfig
> +++ b/drivers/firmware/efi/Kconfig
> @@ -28,8 +28,8 @@ config EFI_VARS_PSTORE_DEFAULT_DISABLE
>
> config EFI_RUNTIME_MAP
> bool "Export efi runtime maps to sysfs"
> - depends on X86 && EFI && KEXEC_CORE
> - default y
> + depends on X86 && EFI
> + default KEXEC_CORE
> help
>
> and maybe add an 'if EXPERT' so that the option is only visible to
> modify when CONFIG_EXPERT=y ?

Above changes look good to me.

>
> In any case, I intend to move this code into arch/x86 as well, so I'll
> have a couple of patches out shortly.

That would be better since it is X86 only. Thanks, Ard.


2022-11-07 08:12:27

by Dave Young

[permalink] [raw]
Subject: Re: Bug: kexec on Lenovo ThinkPad T480 disables EFI mode

On Mon, 7 Nov 2022 at 15:36, Dave Young <[email protected]> wrote:
>
> Hi Ard,
>
> On Mon, 7 Nov 2022 at 15:30, Ard Biesheuvel <[email protected]> wrote:
> >
> > On Mon, 7 Nov 2022 at 07:55, Dave Young <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > On Sat, 5 Nov 2022 at 22:16, <[email protected]> wrote:
> > > >
> > > > On 2022-11-05 05:49, Dave Young wrote:
> > > > > Baoquan, thanks for cc me.
> > > > >
> > > > > On Sat, 5 Nov 2022 at 11:10, Baoquan He <[email protected]> wrote:
> > > > >>
> > > > >> Add Dave to CC
> > > > >>
> > > > >> On 10/28/22 at 01:02pm, [email protected] wrote:
> > > > >> > Greetings,
> > > > >> >
> > > > >> > I've been hitting a bug on my Lenovo ThinkPad T480 where kexecing will
> > > > >> > cause EFI mode (if that's the right term for it) to be unconditionally
> > > > >> > disabled, even when not using the --noefi option to kexec.
> > > > >> >
> > > > >> > What I mean by "EFI mode" being disabled, more than just EFI runtime
> > > > >> > services, is that basically nothing about the system's EFI is visible
> > > > >> > post-kexec. Normally you have a message like this in dmesg when the
> > > > >> > system is booted in EFI mode:
> > > > >> >
> > > > >> > [ 0.000000] efi: EFI v2.70 by EDK II
> > > > >> > [ 0.000000] efi: SMBIOS=0x7f98a000 ACPI=0x7fb7e000 ACPI 2.0=0x7fb7e014
> > > > >> > MEMATTR=0x7ec63018
> > > > >> > (obviously not the real firmware of the machine I'm talking about, but I
> > > > >> > can also send that if it would be of any help)
> > > > >> >
> > > > >> > No such message pops up in my dmesg as a result of this bug, & this
> > > > >> > causes some fallout like being unable to find the system's DMI
> > > > >> > information:
> > > > >> >
> > > > >> > <6>[ 0.000000] DMI not present or invalid.
> > > > >> >
> > > > >> > The efivarfs module also fails to load with -ENODEV.
> > > > >> >
> > > > >> > I've tried also booting with efi=runtime explicitly but it doesn't
> > > > >> > change anything. The kernel still does not print the name of the EFI
> > > > >> > firmware, DMI is still missing, & efivarfs still fails to load.
> > > > >> >
> > > > >> > I've been using the kexec_load syscall for all these tests, if it's
> > > > >> > important.
> > > > >> >
> > > > >> > Also, to make it very clear, all this only ever happens post-kexec. When
> > > > >> > booting straight from UEFI (with the EFI stub), all the aforementioned
> > > > >> > stuff that fails works perfectly fine (i.e. name of firmware is printed,
> > > > >> > DMI is properly found, & efivarfs loads & mounts just fine).
> > > > >> >
> > > > >> > This is reproducible with a vanilla 6.1-rc2 kernel. I've been trying to
> > > > >> > bisect it, but it seems like it goes pretty far back. I've got vanilla
> > > > >> > mainline kernel builds dating back to 5.17 that have the exact same
> > > > >> > issue. It might be worth noting that during this testing, I made sure
> > > > >> > the version of the kernel being kexeced & the kernel kexecing were the
> > > > >> > same version. It may not have been a problem in older kernels, but that
> > > > >> > would be difficult to test for me (a pretty important driver for this
> > > > >> > machine was only merged during v5.17-rc4). So it may not have been a
> > > > >> > regression & just a hidden problem since time immemorial.
> > > > >> >
> > > > >> > I am willing to test any patches I may get to further debug or fix
> > > > >> > this issue, preferably based on the current state of torvalds/linux.git.
> > > > >> > I can build & test kernels quite a few times per day.
> > > > >> >
> > > > >> > I can also send any important materials (kernel config, dmesg, firmware
> > > > >> > information, so on & so forth) on request. I'll also just mention I'm
> > > > >> > using kexec-tools 2.0.24 upfront, if it matters.
> > > > >
> > > > > Can you check the efi runtime in sysfs:
> > > > > ls /sys/firmware/efi/runtime-map/
> > > > >
> > > > > If nothing then maybe you did not enable CONFIG_EFI_RUNTIME_MAP=y, it
> > > > > is needed for kexec UEFI boot on x86_64.
> > > >
> > > > Oh my, it really is that simple.
> > > >
> > > > Indeed, enabling this in the pre-kexec kernel fixes it all up. I had
> > > > blindly disabled it in my quest to downsize the pre-kexec kernel to
> > > > reduce boot time (it only runs a bootloader). In hindsight, the firmware
> > > > drivers section is not really a good section to tweak on a whim.
> > > >
> > > > I'm terribly sorry to have taken your time to "fix" this "bug". But I
> > > > must ask, is there any reason why this is a visible config option, or at
> > > > least not gated behind CONFIG_EXPERT? drivers/firmware/efi/runtime-map.c
> > > > is pretty tiny, & considering it depends on CONFIG_KEXEC_CORE, one
> > > > probably wants to have kexec work properly if they can even enable it.
> > >
> > > Glad to know it works with the .config tweaking. I can not recall any
> > > reason for that though.
> > >
> > > Since it sits in the efi code path, let's see how Ard thinks about
> > > your proposal.
> > >
> >
> > I don't understand why EFI_RUNTIME_MAP should depend on KEXEC_CORE at
> > all: it is documented as a feature that can be enabled for debugging
> > as well, and kexec does not work as expected without it.
>
> Probably debugging only mentioned in text, but not been considered in
> the kconfig logic :(
>
> >
> > Should we just change it like this perhaps?
> >
> > --- a/drivers/firmware/efi/Kconfig
> > +++ b/drivers/firmware/efi/Kconfig
> > @@ -28,8 +28,8 @@ config EFI_VARS_PSTORE_DEFAULT_DISABLE
> >
> > config EFI_RUNTIME_MAP
> > bool "Export efi runtime maps to sysfs"
> > - depends on X86 && EFI && KEXEC_CORE
> > - default y
> > + depends on X86 && EFI
> > + default KEXEC_CORE
> > help
> >
> > and maybe add an 'if EXPERT' so that the option is only visible to
> > modify when CONFIG_EXPERT=y ?
>
> Above changes look good to me.
>
> >
> > In any case, I intend to move this code into arch/x86 as well, so I'll
> > have a couple of patches out shortly.
>
> That would be better since it is X86 only. Thanks, Ard.

Hmm, before doing that, do you think it is useful for debugging
purposes? That could be a reason to sit in efi code instead of x86 ..


2022-11-07 08:14:23

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: Bug: kexec on Lenovo ThinkPad T480 disables EFI mode

On Mon, 7 Nov 2022 at 08:40, Dave Young <[email protected]> wrote:
>
> On Mon, 7 Nov 2022 at 15:36, Dave Young <[email protected]> wrote:
> >
> > Hi Ard,
> >
> > On Mon, 7 Nov 2022 at 15:30, Ard Biesheuvel <[email protected]> wrote:
> > >
> > > On Mon, 7 Nov 2022 at 07:55, Dave Young <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Sat, 5 Nov 2022 at 22:16, <[email protected]> wrote:
> > > > >
> > > > > On 2022-11-05 05:49, Dave Young wrote:
> > > > > > Baoquan, thanks for cc me.
> > > > > >
> > > > > > On Sat, 5 Nov 2022 at 11:10, Baoquan He <[email protected]> wrote:
> > > > > >>
> > > > > >> Add Dave to CC
> > > > > >>
> > > > > >> On 10/28/22 at 01:02pm, [email protected] wrote:
> > > > > >> > Greetings,
> > > > > >> >
> > > > > >> > I've been hitting a bug on my Lenovo ThinkPad T480 where kexecing will
> > > > > >> > cause EFI mode (if that's the right term for it) to be unconditionally
> > > > > >> > disabled, even when not using the --noefi option to kexec.
> > > > > >> >
> > > > > >> > What I mean by "EFI mode" being disabled, more than just EFI runtime
> > > > > >> > services, is that basically nothing about the system's EFI is visible
> > > > > >> > post-kexec. Normally you have a message like this in dmesg when the
> > > > > >> > system is booted in EFI mode:
> > > > > >> >
> > > > > >> > [ 0.000000] efi: EFI v2.70 by EDK II
> > > > > >> > [ 0.000000] efi: SMBIOS=0x7f98a000 ACPI=0x7fb7e000 ACPI 2.0=0x7fb7e014
> > > > > >> > MEMATTR=0x7ec63018
> > > > > >> > (obviously not the real firmware of the machine I'm talking about, but I
> > > > > >> > can also send that if it would be of any help)
> > > > > >> >
> > > > > >> > No such message pops up in my dmesg as a result of this bug, & this
> > > > > >> > causes some fallout like being unable to find the system's DMI
> > > > > >> > information:
> > > > > >> >
> > > > > >> > <6>[ 0.000000] DMI not present or invalid.
> > > > > >> >
> > > > > >> > The efivarfs module also fails to load with -ENODEV.
> > > > > >> >
> > > > > >> > I've tried also booting with efi=runtime explicitly but it doesn't
> > > > > >> > change anything. The kernel still does not print the name of the EFI
> > > > > >> > firmware, DMI is still missing, & efivarfs still fails to load.
> > > > > >> >
> > > > > >> > I've been using the kexec_load syscall for all these tests, if it's
> > > > > >> > important.
> > > > > >> >
> > > > > >> > Also, to make it very clear, all this only ever happens post-kexec. When
> > > > > >> > booting straight from UEFI (with the EFI stub), all the aforementioned
> > > > > >> > stuff that fails works perfectly fine (i.e. name of firmware is printed,
> > > > > >> > DMI is properly found, & efivarfs loads & mounts just fine).
> > > > > >> >
> > > > > >> > This is reproducible with a vanilla 6.1-rc2 kernel. I've been trying to
> > > > > >> > bisect it, but it seems like it goes pretty far back. I've got vanilla
> > > > > >> > mainline kernel builds dating back to 5.17 that have the exact same
> > > > > >> > issue. It might be worth noting that during this testing, I made sure
> > > > > >> > the version of the kernel being kexeced & the kernel kexecing were the
> > > > > >> > same version. It may not have been a problem in older kernels, but that
> > > > > >> > would be difficult to test for me (a pretty important driver for this
> > > > > >> > machine was only merged during v5.17-rc4). So it may not have been a
> > > > > >> > regression & just a hidden problem since time immemorial.
> > > > > >> >
> > > > > >> > I am willing to test any patches I may get to further debug or fix
> > > > > >> > this issue, preferably based on the current state of torvalds/linux.git.
> > > > > >> > I can build & test kernels quite a few times per day.
> > > > > >> >
> > > > > >> > I can also send any important materials (kernel config, dmesg, firmware
> > > > > >> > information, so on & so forth) on request. I'll also just mention I'm
> > > > > >> > using kexec-tools 2.0.24 upfront, if it matters.
> > > > > >
> > > > > > Can you check the efi runtime in sysfs:
> > > > > > ls /sys/firmware/efi/runtime-map/
> > > > > >
> > > > > > If nothing then maybe you did not enable CONFIG_EFI_RUNTIME_MAP=y, it
> > > > > > is needed for kexec UEFI boot on x86_64.
> > > > >
> > > > > Oh my, it really is that simple.
> > > > >
> > > > > Indeed, enabling this in the pre-kexec kernel fixes it all up. I had
> > > > > blindly disabled it in my quest to downsize the pre-kexec kernel to
> > > > > reduce boot time (it only runs a bootloader). In hindsight, the firmware
> > > > > drivers section is not really a good section to tweak on a whim.
> > > > >
> > > > > I'm terribly sorry to have taken your time to "fix" this "bug". But I
> > > > > must ask, is there any reason why this is a visible config option, or at
> > > > > least not gated behind CONFIG_EXPERT? drivers/firmware/efi/runtime-map.c
> > > > > is pretty tiny, & considering it depends on CONFIG_KEXEC_CORE, one
> > > > > probably wants to have kexec work properly if they can even enable it.
> > > >
> > > > Glad to know it works with the .config tweaking. I can not recall any
> > > > reason for that though.
> > > >
> > > > Since it sits in the efi code path, let's see how Ard thinks about
> > > > your proposal.
> > > >
> > >
> > > I don't understand why EFI_RUNTIME_MAP should depend on KEXEC_CORE at
> > > all: it is documented as a feature that can be enabled for debugging
> > > as well, and kexec does not work as expected without it.
> >
> > Probably debugging only mentioned in text, but not been considered in
> > the kconfig logic :(
> >
> > >
> > > Should we just change it like this perhaps?
> > >
> > > --- a/drivers/firmware/efi/Kconfig
> > > +++ b/drivers/firmware/efi/Kconfig
> > > @@ -28,8 +28,8 @@ config EFI_VARS_PSTORE_DEFAULT_DISABLE
> > >
> > > config EFI_RUNTIME_MAP
> > > bool "Export efi runtime maps to sysfs"
> > > - depends on X86 && EFI && KEXEC_CORE
> > > - default y
> > > + depends on X86 && EFI
> > > + default KEXEC_CORE
> > > help
> > >
> > > and maybe add an 'if EXPERT' so that the option is only visible to
> > > modify when CONFIG_EXPERT=y ?
> >
> > Above changes look good to me.
> >
> > >
> > > In any case, I intend to move this code into arch/x86 as well, so I'll
> > > have a couple of patches out shortly.
> >
> > That would be better since it is X86 only. Thanks, Ard.
>
> Hmm, before doing that, do you think it is useful for debugging
> purposes? That could be a reason to sit in efi code instead of x86 ..
>

This code was only ever enabled on x86, and on ARM/arm64, we can
capture the memory map via efi=debug on any kernel build, and capture
the virtual mappings using PTDUMP (which also gives us the exact
attributes for each mapped region)

So I don't think it has that much value on non-x86 tbh.

2022-11-07 08:15:18

by Dave Young

[permalink] [raw]
Subject: Re: Bug: kexec on Lenovo ThinkPad T480 disables EFI mode

On Mon, 7 Nov 2022 at 15:55, Ard Biesheuvel <[email protected]> wrote:
>
> On Mon, 7 Nov 2022 at 08:40, Dave Young <[email protected]> wrote:
> >
> > On Mon, 7 Nov 2022 at 15:36, Dave Young <[email protected]> wrote:
> > >
> > > Hi Ard,
> > >
> > > On Mon, 7 Nov 2022 at 15:30, Ard Biesheuvel <[email protected]> wrote:
> > > >
> > > > On Mon, 7 Nov 2022 at 07:55, Dave Young <[email protected]> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Sat, 5 Nov 2022 at 22:16, <[email protected]> wrote:
> > > > > >
> > > > > > On 2022-11-05 05:49, Dave Young wrote:
> > > > > > > Baoquan, thanks for cc me.
> > > > > > >
> > > > > > > On Sat, 5 Nov 2022 at 11:10, Baoquan He <[email protected]> wrote:
> > > > > > >>
> > > > > > >> Add Dave to CC
> > > > > > >>
> > > > > > >> On 10/28/22 at 01:02pm, [email protected] wrote:
> > > > > > >> > Greetings,
> > > > > > >> >
> > > > > > >> > I've been hitting a bug on my Lenovo ThinkPad T480 where kexecing will
> > > > > > >> > cause EFI mode (if that's the right term for it) to be unconditionally
> > > > > > >> > disabled, even when not using the --noefi option to kexec.
> > > > > > >> >
> > > > > > >> > What I mean by "EFI mode" being disabled, more than just EFI runtime
> > > > > > >> > services, is that basically nothing about the system's EFI is visible
> > > > > > >> > post-kexec. Normally you have a message like this in dmesg when the
> > > > > > >> > system is booted in EFI mode:
> > > > > > >> >
> > > > > > >> > [ 0.000000] efi: EFI v2.70 by EDK II
> > > > > > >> > [ 0.000000] efi: SMBIOS=0x7f98a000 ACPI=0x7fb7e000 ACPI 2.0=0x7fb7e014
> > > > > > >> > MEMATTR=0x7ec63018
> > > > > > >> > (obviously not the real firmware of the machine I'm talking about, but I
> > > > > > >> > can also send that if it would be of any help)
> > > > > > >> >
> > > > > > >> > No such message pops up in my dmesg as a result of this bug, & this
> > > > > > >> > causes some fallout like being unable to find the system's DMI
> > > > > > >> > information:
> > > > > > >> >
> > > > > > >> > <6>[ 0.000000] DMI not present or invalid.
> > > > > > >> >
> > > > > > >> > The efivarfs module also fails to load with -ENODEV.
> > > > > > >> >
> > > > > > >> > I've tried also booting with efi=runtime explicitly but it doesn't
> > > > > > >> > change anything. The kernel still does not print the name of the EFI
> > > > > > >> > firmware, DMI is still missing, & efivarfs still fails to load.
> > > > > > >> >
> > > > > > >> > I've been using the kexec_load syscall for all these tests, if it's
> > > > > > >> > important.
> > > > > > >> >
> > > > > > >> > Also, to make it very clear, all this only ever happens post-kexec. When
> > > > > > >> > booting straight from UEFI (with the EFI stub), all the aforementioned
> > > > > > >> > stuff that fails works perfectly fine (i.e. name of firmware is printed,
> > > > > > >> > DMI is properly found, & efivarfs loads & mounts just fine).
> > > > > > >> >
> > > > > > >> > This is reproducible with a vanilla 6.1-rc2 kernel. I've been trying to
> > > > > > >> > bisect it, but it seems like it goes pretty far back. I've got vanilla
> > > > > > >> > mainline kernel builds dating back to 5.17 that have the exact same
> > > > > > >> > issue. It might be worth noting that during this testing, I made sure
> > > > > > >> > the version of the kernel being kexeced & the kernel kexecing were the
> > > > > > >> > same version. It may not have been a problem in older kernels, but that
> > > > > > >> > would be difficult to test for me (a pretty important driver for this
> > > > > > >> > machine was only merged during v5.17-rc4). So it may not have been a
> > > > > > >> > regression & just a hidden problem since time immemorial.
> > > > > > >> >
> > > > > > >> > I am willing to test any patches I may get to further debug or fix
> > > > > > >> > this issue, preferably based on the current state of torvalds/linux.git.
> > > > > > >> > I can build & test kernels quite a few times per day.
> > > > > > >> >
> > > > > > >> > I can also send any important materials (kernel config, dmesg, firmware
> > > > > > >> > information, so on & so forth) on request. I'll also just mention I'm
> > > > > > >> > using kexec-tools 2.0.24 upfront, if it matters.
> > > > > > >
> > > > > > > Can you check the efi runtime in sysfs:
> > > > > > > ls /sys/firmware/efi/runtime-map/
> > > > > > >
> > > > > > > If nothing then maybe you did not enable CONFIG_EFI_RUNTIME_MAP=y, it
> > > > > > > is needed for kexec UEFI boot on x86_64.
> > > > > >
> > > > > > Oh my, it really is that simple.
> > > > > >
> > > > > > Indeed, enabling this in the pre-kexec kernel fixes it all up. I had
> > > > > > blindly disabled it in my quest to downsize the pre-kexec kernel to
> > > > > > reduce boot time (it only runs a bootloader). In hindsight, the firmware
> > > > > > drivers section is not really a good section to tweak on a whim.
> > > > > >
> > > > > > I'm terribly sorry to have taken your time to "fix" this "bug". But I
> > > > > > must ask, is there any reason why this is a visible config option, or at
> > > > > > least not gated behind CONFIG_EXPERT? drivers/firmware/efi/runtime-map.c
> > > > > > is pretty tiny, & considering it depends on CONFIG_KEXEC_CORE, one
> > > > > > probably wants to have kexec work properly if they can even enable it.
> > > > >
> > > > > Glad to know it works with the .config tweaking. I can not recall any
> > > > > reason for that though.
> > > > >
> > > > > Since it sits in the efi code path, let's see how Ard thinks about
> > > > > your proposal.
> > > > >
> > > >
> > > > I don't understand why EFI_RUNTIME_MAP should depend on KEXEC_CORE at
> > > > all: it is documented as a feature that can be enabled for debugging
> > > > as well, and kexec does not work as expected without it.
> > >
> > > Probably debugging only mentioned in text, but not been considered in
> > > the kconfig logic :(
> > >
> > > >
> > > > Should we just change it like this perhaps?
> > > >
> > > > --- a/drivers/firmware/efi/Kconfig
> > > > +++ b/drivers/firmware/efi/Kconfig
> > > > @@ -28,8 +28,8 @@ config EFI_VARS_PSTORE_DEFAULT_DISABLE
> > > >
> > > > config EFI_RUNTIME_MAP
> > > > bool "Export efi runtime maps to sysfs"
> > > > - depends on X86 && EFI && KEXEC_CORE
> > > > - default y
> > > > + depends on X86 && EFI
> > > > + default KEXEC_CORE
> > > > help
> > > >
> > > > and maybe add an 'if EXPERT' so that the option is only visible to
> > > > modify when CONFIG_EXPERT=y ?
> > >
> > > Above changes look good to me.
> > >
> > > >
> > > > In any case, I intend to move this code into arch/x86 as well, so I'll
> > > > have a couple of patches out shortly.
> > >
> > > That would be better since it is X86 only. Thanks, Ard.
> >
> > Hmm, before doing that, do you think it is useful for debugging
> > purposes? That could be a reason to sit in efi code instead of x86 ..
> >
>
> This code was only ever enabled on x86, and on ARM/arm64, we can
> capture the memory map via efi=debug on any kernel build, and capture
> the virtual mappings using PTDUMP (which also gives us the exact
> attributes for each mapped region)
>
> So I don't think it has that much value on non-x86 tbh.

Ok, fair enough.

Thanks
Dave