2019-08-16 06:15:20

by Daniel Drake

[permalink] [raw]
Subject: Early EFI-related boot freeze in parse_setup_data()

Hi,

We're working with a new consumer MiniPC based on AMD E1-7010.

It fails to boot Linux when booting in EFI mode - it hangs with
nothing on screen. earlycon=efifb doesn't show any output.

Looking closer, I was able to confirm that we reach EFI
ExitBootServices() via efi_printk in the efi stub. But you can't use
EFI's console functionality after that point, so I then resorted to
inserting calls to:

idt_invalidate(NULL); __asm__ __volatile__("int3");

throughout the early boot code that follows in order to force a system
reset. That way I could deduce if execution was reaching that point
(system reset) or not (system hang as before). As a side-question I'd
be curious if there is any better way to debug such early boot
failures on consumer x86 hardware without a serial port...

Anyway, the system freeze occurs in parse_setup_data(), specifically:

data = early_memremap(pa_data, sizeof(*data));
data_len = data->len + sizeof(struct setup_data);

Dereferencing data->len causes the system to hang. I presume it
triggers an exception handler due to some kind of invalid memory
access.

By returning early in that function, boot continues basically fine. So
I could then log the details: pa_data has value 0x892bb018 and
early_memremap returns address 0xffffffffff200018. Accessing just a
single byte at that address causes the system hang.

This original pa_data value (from boot_params.hdr.setup_data) was set
by the EFI stub in setup_efi_pci(). I confirmed that the same
0x892bb018 value is set there, it is not being corrupted along the
way.

Any suggestions for how to diagnose further?

dmesg output:
https://gist.github.com/dsd/199bed7b590e90efdf73f9f6384ca551

Thanks
Daniel


2019-08-27 06:25:31

by Daniel Drake

[permalink] [raw]
Subject: Re: Early EFI-related boot freeze in parse_setup_data()

On Fri, Aug 16, 2019 at 2:14 PM Daniel Drake <[email protected]> wrote:
> Anyway, the system freeze occurs in parse_setup_data(), specifically:
>
> data = early_memremap(pa_data, sizeof(*data));
> data_len = data->len + sizeof(struct setup_data);
>
> Dereferencing data->len causes the system to hang. I presume it
> triggers an exception handler due to some kind of invalid memory
> access.
>
> By returning early in that function, boot continues basically fine. So
> I could then log the details: pa_data has value 0x892bb018 and
> early_memremap returns address 0xffffffffff200018. Accessing just a
> single byte at that address causes the system hang.

I noticed a complaint about NX in the logs, right where it does the
early_memremap of this data (which is now at address 0x893c0018):

Notice: NX (Execute Disable) protection missing in CPU!
e820: update [mem 0x893c0018-0x893cec57] usable ==> usable
e820: update [mem 0x893c0018-0x893cec57] usable ==> usable
e820: update [mem 0x893b3018-0x893bf057] usable ==> usable
e820: update [mem 0x893b3018-0x893bf057] usable ==> usable

Indeed, in the BIOS setup menu, "NX Mode" was Disabled.
Setting it to Enabled avoids the hang and Linux boots as normal. Weird!

Daniel