2013-03-06 16:55:41

by Peter Jones

[permalink] [raw]
Subject: Re: Revert commit 5dcd14ecd4 - breaks EFI boot with SLES11 elilo.efi

On Thu, Feb 28, 2013 at 01:12:11PM -0800, H. Peter Anvin wrote:
> Then make it follow the boot spec:
>
> > In 32-bit boot protocol, the first step in loading a Linux kernel
> > should be to setup the boot parameters (struct boot_params,
> > traditionally known as "zero page"). The memory for struct boot_params
> > should be allocated and initialized to all zero. Then the setup header
> > from offset 0x01f1 of kernel image on should be loaded into struct
> > boot_params and examined. The end of setup header can be calculated as
> > follow:
> >
> > 0x0202 + byte value at offset 0x0201
>
> ... so we don't have to.

So, the problem here seems to be that there's never been widespread
compliance with this paragraph, but this patch assumes there has. A
brief survey concludes:

grub 1 on bios - loads the kernel and edits the parameters it cares
about in place
grub 1 on efi - allocates a buffer (fails to clear it) and modifies
the parameters it cares about, then copies it back
grub 2 on bios - clears the buffer, writes what it cares about
grub 2 on efi (using efi boot stub) - reads the buffer, modifies fields
it cares about, passes the pointer to the boot stub
elilo - allocates a new buffer, copies the kernel structure in to it,
allocates another buffer, clears it, copies the first structure
in to it, frees the first buffer, modifies fields it cares about
in the second buffer, clears some other fields in the second
structure, and passes the pointer in when it calls the old entry
point
(It's possible that there's some newer version of elilo than 3.14,
which I had handy, but I'm not going to do deeper research on a
project that keeps a link to its CVS repo on the most obvious
google result, lest I lose the will to live.)
syslinux - I'm just going to assume that your code matches the spec.

So it's certainly worth trying to find a better way to check this, but I
don't think this patch is it. If we're going to enforce it, we have to
make sure that a bootloader that's conforming to what was de facto the
standard in 0x020b still works. Otherwise we're just breaking
bootloaders for no reason, and that will end poorly.

I'd suggest we add a field for the bootloader to make a positive
declaration of what version it is using, and only check for the sentinel
if the field claims it's doing 0x020c or newer.

--
Peter


2013-03-06 17:14:40

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Revert commit 5dcd14ecd4 - breaks EFI boot with SLES11 elilo.efi

On 03/06/2013 08:55 AM, Peter Jones wrote:
>
> So, the problem here seems to be that there's never been widespread
> compliance with this paragraph, but this patch assumes there has. A
> brief survey concludes:
>

No, this patch doesn't assume there is widespread compliance, it is
trying to address the bits that are not complied with.

> grub 1 on bios - loads the kernel and edits the parameters it cares
> about in place
> grub 1 on efi - allocates a buffer (fails to clear it) and modifies
> the parameters it cares about, then copies it back
> grub 2 on bios - clears the buffer, writes what it cares about

On BIOS, anything that invokes the 16-bit entry point will be correct,
because it is the 16-bit code that sets up struct boot_params.

> grub 2 on efi (using efi boot stub) - reads the buffer, modifies fields
> it cares about, passes the pointer to the boot stub
> elilo - allocates a new buffer, copies the kernel structure in to it,
> allocates another buffer, clears it, copies the first structure
> in to it, frees the first buffer, modifies fields it cares about
> in the second buffer, clears some other fields in the second
> structure, and passes the pointer in when it calls the old entry
> point
> (It's possible that there's some newer version of elilo than 3.14,
> which I had handy, but I'm not going to do deeper research on a
> project that keeps a link to its CVS repo on the most obvious
> google result, lest I lose the will to live.)
> syslinux - I'm just going to assume that your code matches the spec.
>
> So it's certainly worth trying to find a better way to check this, but I
> don't think this patch is it. If we're going to enforce it, we have to
> make sure that a bootloader that's conforming to what was de facto the
> standard in 0x020b still works. Otherwise we're just breaking
> bootloaders for no reason, and that will end poorly.
>
> I'd suggest we add a field for the bootloader to make a positive
> declaration of what version it is using, and only check for the sentinel
> if the field claims it's doing 0x020c or newer.

Except it doesn't quite work. The problem is that these broken
bootloaders aren't just a matter of 2.11 vs 2.12, they are implicitly
assuming that the kernel image itself doesn't happen to contain anything
harmful in the fields that they don't bother initializing. This would
be nice and good, except that the demands for the boot sector space is
fairly high and it gets very cantankerous to turn that into a minefield.

In fact, your suggestion is exactly equivalent to the sentinel, except
you want it to be pre-initialized with 0x20b instead of 0xffff.

As such, I don't really know anything better we can do other than:

1. detect the *properly working* case of the structure properly
initialized;
2. doing legacy bootloader-specific clearing based on the bootloader ID
if the sentinel triggers -- if you can think of better heuristics
then that would be good;
3. try to get bootloaders switched from case #2 to case #1.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2013-03-06 17:32:14

by Peter Jones

[permalink] [raw]
Subject: Re: Revert commit 5dcd14ecd4 - breaks EFI boot with SLES11 elilo.efi

On Wed, Mar 06, 2013 at 09:14:27AM -0800, H. Peter Anvin wrote:
> On 03/06/2013 08:55 AM, Peter Jones wrote:
> >
> > So, the problem here seems to be that there's never been widespread
> > compliance with this paragraph, but this patch assumes there has. A
> > brief survey concludes:
>
> No, this patch doesn't assume there is widespread compliance, it is
> trying to address the bits that are not complied with.

Right, but that's basically every x86_64 UEFI machine ever deployed.

[lots trimmed]
> > So it's certainly worth trying to find a better way to check this, but I
> > don't think this patch is it. If we're going to enforce it, we have to
> > make sure that a bootloader that's conforming to what was de facto the
> > standard in 0x020b still works. Otherwise we're just breaking
> > bootloaders for no reason, and that will end poorly.
> >
> > I'd suggest we add a field for the bootloader to make a positive
> > declaration of what version it is using, and only check for the sentinel
> > if the field claims it's doing 0x020c or newer.
>
> Except it doesn't quite work. The problem is that these broken
> bootloaders aren't just a matter of 2.11 vs 2.12, they are implicitly
> assuming that the kernel image itself doesn't happen to contain anything
> harmful in the fields that they don't bother initializing. This would
> be nice and good, except that the demands for the boot sector space is
> fairly high and it gets very cantankerous to turn that into a minefield.

If your only objection is real estate, we can find a way to be clever
about what we do that uses already existing space. For instance, write
back the version number that's supported in the version field, but
byte-swapped, so we can tell it changed (we don't anticipate ever
supporting protocol 0x20b from a kernel that advertises 0xb02, right?)

Just one example - we don't have to do this the exact way I said; we
just need a positive assertion from the bootloader to start doing
enforcement. Versions would be nice, but they're not strictly required.

> In fact, your suggestion is exactly equivalent to the sentinel, except
> you want it to be pre-initialized with 0x20b instead of 0xffff.

No, I want the bootloader to communicate that it understands the boot
protocol revision is 0x020c, so we can /safely/ enter a world where
we're forbidding booting from an older bootloader.

> As such, I don't really know anything better we can do other than:
>
> 1. detect the *properly working* case of the structure properly
> initialized

Which is easy, but it doesn't seem to be anything anybody has ever
shipped on UEFI machines.

> 2. doing legacy bootloader-specific clearing based on the bootloader ID
> if the sentinel triggers -- if you can think of better heuristics
> then that would be good;

This heuristic is "all UEFI bootloaders anybody uses". You can list
them individually, but it's the same as reverting the patch, just with
more code.

> 3. try to get bootloaders switched from case #2 to case #1.

And I'm for that, but I think we should delay enforcement until they've
got a way to express that.

--
Peter