2019-08-05 08:36:53

by Dave Young

[permalink] [raw]
Subject: [PATCH] do not clean dummy variable in kexec path

kexec reboot fails randomly in UEFI based kvm guest. The firmware
just reset while calling efi_delete_dummy_variable(); Unfortunately
I don't know how to debug the firmware, it is also possible a potential
problem on real hardware as well although nobody reproduced it.

The intention of efi_delete_dummy_variable is to trigger garbage collection
when entering virtual mode. But SetVirtualAddressMap can only run once
for each physical reboot, thus kexec_enter_virtual_mode is not necessarily
a good place to clean dummy object.

Drop efi_delete_dummy_variable so that kexec reboot can work.

Signed-off-by: Dave Young <[email protected]>
---
arch/x86/platform/efi/efi.c | 3 ---
1 file changed, 3 deletions(-)

--- linux-x86.orig/arch/x86/platform/efi/efi.c
+++ linux-x86/arch/x86/platform/efi/efi.c
@@ -894,9 +894,6 @@ static void __init kexec_enter_virtual_m

if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
runtime_code_page_mkexec();
-
- /* clean DUMMY object */
- efi_delete_dummy_variable();
#endif
}


2019-08-05 15:56:39

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] do not clean dummy variable in kexec path

On Mon, 5 Aug 2019 at 11:36, Dave Young <[email protected]> wrote:
>
> kexec reboot fails randomly in UEFI based kvm guest. The firmware
> just reset while calling efi_delete_dummy_variable(); Unfortunately
> I don't know how to debug the firmware, it is also possible a potential
> problem on real hardware as well although nobody reproduced it.
>
> The intention of efi_delete_dummy_variable is to trigger garbage collection
> when entering virtual mode. But SetVirtualAddressMap can only run once
> for each physical reboot, thus kexec_enter_virtual_mode is not necessarily
> a good place to clean dummy object.
>

I would argue that this means it is not a good place to *create* the
dummy variable, and if we don't create it, we don't have to delete it
either.

> Drop efi_delete_dummy_variable so that kexec reboot can work.
>

Creating it and not deleting it is bad, so please try and see if we
can omit the creation on this code path instead.


> Signed-off-by: Dave Young <[email protected]>
> ---
> arch/x86/platform/efi/efi.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> --- linux-x86.orig/arch/x86/platform/efi/efi.c
> +++ linux-x86/arch/x86/platform/efi/efi.c
> @@ -894,9 +894,6 @@ static void __init kexec_enter_virtual_m
>
> if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
> runtime_code_page_mkexec();
> -
> - /* clean DUMMY object */
> - efi_delete_dummy_variable();
> #endif
> }
>

2019-08-05 17:10:44

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH] do not clean dummy variable in kexec path

On Mon, Aug 5, 2019 at 1:36 AM Dave Young <[email protected]> wrote:
>
> kexec reboot fails randomly in UEFI based kvm guest. The firmware
> just reset while calling efi_delete_dummy_variable(); Unfortunately
> I don't know how to debug the firmware, it is also possible a potential
> problem on real hardware as well although nobody reproduced it.
>
> The intention of efi_delete_dummy_variable is to trigger garbage collection
> when entering virtual mode. But SetVirtualAddressMap can only run once
> for each physical reboot, thus kexec_enter_virtual_mode is not necessarily
> a good place to clean dummy object.

I agree that this isn't necessarily the best place to do this in the
kexec case, but given we control the firmware, figuring out what's
actually breaking seems like a good plan.

2019-08-06 02:42:55

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH] do not clean dummy variable in kexec path

On 08/05/19 at 06:55pm, Ard Biesheuvel wrote:
> On Mon, 5 Aug 2019 at 11:36, Dave Young <[email protected]> wrote:
> >
> > kexec reboot fails randomly in UEFI based kvm guest. The firmware
> > just reset while calling efi_delete_dummy_variable(); Unfortunately
> > I don't know how to debug the firmware, it is also possible a potential
> > problem on real hardware as well although nobody reproduced it.
> >
> > The intention of efi_delete_dummy_variable is to trigger garbage collection
> > when entering virtual mode. But SetVirtualAddressMap can only run once
> > for each physical reboot, thus kexec_enter_virtual_mode is not necessarily
> > a good place to clean dummy object.
> >
>
> I would argue that this means it is not a good place to *create* the
> dummy variable, and if we don't create it, we don't have to delete it
> either.
>
> > Drop efi_delete_dummy_variable so that kexec reboot can work.
> >
>
> Creating it and not deleting it is bad, so please try and see if we
> can omit the creation on this code path instead.

I'm not sure in this case the var is created or not, the logic seems
tricky to me. It seems to me it is intend to force delete a non-exist
var here.

Matthew, can you comment here about Ard's question?

>
>
> > Signed-off-by: Dave Young <[email protected]>
> > ---
> > arch/x86/platform/efi/efi.c | 3 ---
> > 1 file changed, 3 deletions(-)
> >
> > --- linux-x86.orig/arch/x86/platform/efi/efi.c
> > +++ linux-x86/arch/x86/platform/efi/efi.c
> > @@ -894,9 +894,6 @@ static void __init kexec_enter_virtual_m
> >
> > if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
> > runtime_code_page_mkexec();
> > -
> > - /* clean DUMMY object */
> > - efi_delete_dummy_variable();
> > #endif
> > }
> >

2019-08-06 02:45:09

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH] do not clean dummy variable in kexec path

On 08/05/19 at 10:09am, Matthew Garrett wrote:
> On Mon, Aug 5, 2019 at 1:36 AM Dave Young <[email protected]> wrote:
> >
> > kexec reboot fails randomly in UEFI based kvm guest. The firmware
> > just reset while calling efi_delete_dummy_variable(); Unfortunately
> > I don't know how to debug the firmware, it is also possible a potential
> > problem on real hardware as well although nobody reproduced it.
> >
> > The intention of efi_delete_dummy_variable is to trigger garbage collection
> > when entering virtual mode. But SetVirtualAddressMap can only run once
> > for each physical reboot, thus kexec_enter_virtual_mode is not necessarily
> > a good place to clean dummy object.
>
> I agree that this isn't necessarily the best place to do this in the
> kexec case, but given we control the firmware, figuring out what's
> actually breaking seems like a good plan.

I'm more than glad to get the root cause, if you can help on debugging I
would like to share the efi var file etc.

But it is indeed a problem cause weird reset on end user part, but even if we can
not find the root cause (in firmware..) I think we still need avoid it
with such workaround.

Thanks
Dave

2019-08-08 07:51:04

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH] do not clean dummy variable in kexec path

On 08/05/19 at 06:55pm, Ard Biesheuvel wrote:
> On Mon, 5 Aug 2019 at 11:36, Dave Young <[email protected]> wrote:
> >
> > kexec reboot fails randomly in UEFI based kvm guest. The firmware
> > just reset while calling efi_delete_dummy_variable(); Unfortunately
> > I don't know how to debug the firmware, it is also possible a potential
> > problem on real hardware as well although nobody reproduced it.
> >
> > The intention of efi_delete_dummy_variable is to trigger garbage collection
> > when entering virtual mode. But SetVirtualAddressMap can only run once
> > for each physical reboot, thus kexec_enter_virtual_mode is not necessarily
> > a good place to clean dummy object.
> >
>
> I would argue that this means it is not a good place to *create* the
> dummy variable, and if we don't create it, we don't have to delete it
> either.
>
> > Drop efi_delete_dummy_variable so that kexec reboot can work.
> >
>
> Creating it and not deleting it is bad, so please try and see if we
> can omit the creation on this code path instead.
>

Check the code for the dummy var, it is created only in below chunk:
arch/x86/platform/efi/quirks.c:
efi_query_variable_store():
[snip]
/*
* We account for that by refusing the write if permitting it would
* reduce the available space to under 5KB. This figure was provided by
* Samsung, so should be safe.
*/
if ((remaining_size - size < EFI_MIN_RESERVE) &&
!efi_no_storage_paranoia) {

/*
* Triggering garbage collection may require that the firmware
* generate a real EFI_OUT_OF_RESOURCES error. We can force
* that by attempting to use more space than is available.
*/
unsigned long dummy_size = remaining_size + 1024;
void *dummy = kzalloc(dummy_size, GFP_KERNEL);

if (!dummy)
return EFI_OUT_OF_RESOURCES;

status = efi.set_variable((efi_char16_t *)efi_dummy_name,
&EFI_DUMMY_GUID,
EFI_VARIABLE_NON_VOLATILE |
EFI_VARIABLE_BOOTSERVICE_ACCESS |
EFI_VARIABLE_RUNTIME_ACCESS,
dummy_size, dummy);

if (status == EFI_SUCCESS) {
/*
* This should have failed, so if it didn't make sure
* that we delete it...
*/
efi_delete_dummy_variable();
}

[snip]

So the dummy var only be created when the if condition matched, also
once creating succeeded it is deleted. The deleting while entering
virtual mode is always deleting a non exist efi var. Please correct me
if I miss something.

If above is true, then at least in the kexec path can be dropped because
we have a real bug which resets machine.

Thanks
Dave