2016-04-06 02:40:32

by Luis Chamberlain

[permalink] [raw]
Subject: HVMLite / PVHv2 - using x86 EFI boot entry

Boris sent out the first HVMLite series of patches to add a new Xen guest type
February 1, 2016 [0]. We've been talking off list with a few folks now over
the prospect of instead of adding yet-another-boot-entry we instead fixate
HVMLite to use the x86 EFI boot entry. There's a series of reasons to consider
this, likewise there are reasons to question the effort required and if its
really needed. We'd like some more public review of this proposal, and see if
others can come up with other ideas, both in favor or against this proposal.

This in particular is also a good time to get x86 Linux folks to chime on on
the general design proposal of HVMLite design, given that outside of the boot
entry discussion it would seem including myself that we didn't get the memo
over the proposed architecture review [1]. At least on my behalf perhaps the
only sticking thorns of the design was the new boot entry, which came to me
as a surprise, and this thread addresses and the lack of addressing semantics
for early boot (which we may seem to need to address; some of this is being
addressing in parallels through other work). The HVMLite document talks about
using ACPI_FADT_NO_VGA -- we don't use this yet upstream but I have some pending
changes which should make it easy to integrate its use on HVMLite. Perhaps
there are others that may have some other points they may want to raise now...

A huge summary of the discussion over EFI boot option for HVMLite is now on a
wiki [2], below I'll just provide the outline of the discussion. Consider this a
request for more public review, feel free to take any of the items below and
elaborate on it as you see fit.

Worth mentioning also is that this topic will be discussed at the 2016 Xen
Hackathon April 18-19 [3] at the ARM Cambridge, UK Headquarters so if you can
attend and this topic interests you, consider attending.

* Linux x86 Xen EFI boot entry evaluation
* Issues with boot x86 boot entries
* Bypassing native startup_32() / startup_64()
* Small x86 zero page stubs

* Xen evolution and roadmap
* About PVH
* About HVMLite
* Xen ARM solution

* Why use EFI for HVMlite
* EFI calling conventions are standardized
* EFI entry generalizes what new HVMLite entry proposes
* Further semantics may be needed
* Match Xen ARM's clean solution
* You don't need full EFI emulation
* Minimal EFI stubs for guests
* GetMemoryMap()
* ExitBootServices()
* EFI stubs which may be needed for guests
* Exit()
* Variable operation functions
* EFI stubs not needed for guests
* GetTime()/SetTime()
* SetVirtualAddressMap()
* ResetSystem()
* dom0 EFI
* domU EFI emulation possibilities
* Xen implements its own EFI environment for guests
* Xen uses Tianocore / OVMF
* kexec needs a boot path as well

* Points against using EFI
* Legacy PV guests need to be supported
* Nulling the claimed boot loader effect
* startup_32 / startup_64 flexibility
* Remaining questions

[0] http://lkml.kernel.org/r/[email protected]
[1] http://lists.xen.org/archives/html/xen-devel/2016-02/msg01609.html
[2] http://kernelnewbies.org/KernelProjects/x86-xen-efi
[3] http://wiki.xenproject.org/wiki/Hackathon/April2016

Luis


2016-04-06 09:40:41

by David Vrabel

[permalink] [raw]
Subject: Re: HVMLite / PVHv2 - using x86 EFI boot entry

On 06/04/16 03:40, Luis R. Rodriguez wrote:
>
> * You don't need full EFI emulation

I think needing any EFI emulation inside Xen (which is where it would
need to be for dom0) is not suitable because of the increase in
hypervisor ABI.

I also still do not understand your objection to the current tiny stub.

David

2016-04-06 11:09:17

by George Dunlap

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Wed, Apr 6, 2016 at 3:40 AM, Luis R. Rodriguez <[email protected]> wrote:
> A huge summary of the discussion over EFI boot option for HVMLite is now on a
> wiki [2], below I'll just provide the outline of the discussion. Consider this a
> request for more public review, feel free to take any of the items below and
> elaborate on it as you see fit.
[snip]
> * Issues with boot x86 boot entries
> * Small x86 zero page stubs
[snip]
> * Points against using EFI
> * Nulling the claimed boot loader effect

I'm a bit confused about this. You list exactly two arguments against
the proposed stub in the "con" section:
1. Bootloaders may not be able to use the extra entry point
2. It's an extra entry point

And then later, in another section, you actually list the reason #1 is
irrelevant: bootloaders don't matter because the stub is there to boot
from the Xen hypervisor.

So the only actual argument you have against the proposed PVH stub in
the linked document is that it's an extra entry point.

> * Why use EFI for HVMlite
> * EFI calling conventions are standardized
> * EFI entry generalizes what new HVMLite entry proposes
> * Further semantics may be needed
> * Match Xen ARM's clean solution
> * You don't need full EFI emulation
> * Minimal EFI stubs for guests
> * GetMemoryMap()
> * ExitBootServices()
> * EFI stubs which may be needed for guests
> * Exit()
> * Variable operation functions
> * EFI stubs not needed for guests
> * GetTime()/SetTime()
> * SetVirtualAddressMap()
> * ResetSystem()
> * dom0 EFI
> * domU EFI emulation possibilities
> * Xen implements its own EFI environment for guests
> * Xen uses Tianocore / OVMF

So rather than make a new entry point which does just the minimal
amount of work to run on a software interface (Xen), you want to take
an interface designed for hardware (EFI) and put in hacks so that it
knows that sometimes some EFI services are not available? That sounds
like it's going to make the EFI path just as unmanageable as the
current PV path.

Using the EFI entry point would certainly make sense if it was
actually simpler than the proposed extra entry point. But it sounds
like it's going to be more complicated, not only for Xen, but also for
Linux.

-George

2016-04-06 11:12:48

by Daniel Kiper

[permalink] [raw]
Subject: Re: HVMLite / PVHv2 - using x86 EFI boot entry

On Wed, Apr 06, 2016 at 04:40:27AM +0200, Luis R. Rodriguez wrote:
> Boris sent out the first HVMLite series of patches to add a new Xen guest type
> February 1, 2016 [0]. We've been talking off list with a few folks now over
> the prospect of instead of adding yet-another-boot-entry we instead fixate
> HVMLite to use the x86 EFI boot entry. There's a series of reasons to consider
> this, likewise there are reasons to question the effort required and if its
> really needed. We'd like some more public review of this proposal, and see if
> others can come up with other ideas, both in favor or against this proposal.
>
> This in particular is also a good time to get x86 Linux folks to chime on on
> the general design proposal of HVMLite design, given that outside of the boot
> entry discussion it would seem including myself that we didn't get the memo
> over the proposed architecture review [1]. At least on my behalf perhaps the
> only sticking thorns of the design was the new boot entry, which came to me
> as a surprise, and this thread addresses and the lack of addressing semantics
> for early boot (which we may seem to need to address; some of this is being
> addressing in parallels through other work). The HVMLite document talks about
> using ACPI_FADT_NO_VGA -- we don't use this yet upstream but I have some pending
> changes which should make it easy to integrate its use on HVMLite. Perhaps
> there are others that may have some other points they may want to raise now...
>
> A huge summary of the discussion over EFI boot option for HVMLite is now on a
> wiki [2], below I'll just provide the outline of the discussion. Consider this a
> request for more public review, feel free to take any of the items below and
> elaborate on it as you see fit.
>
> Worth mentioning also is that this topic will be discussed at the 2016 Xen
> Hackathon April 18-19 [3] at the ARM Cambridge, UK Headquarters so if you can
> attend and this topic interests you, consider attending.

I hope that you will be there as one of the biggest proponents of EFI entry point.
If you does not it will be difficult or impossible to discuss this issue without you.
In the worst case I can raise this topic on behalf of you and then we should organize
phone call if possible (and accepted by others). However, to do that I must know your
plans in advance.

Daniel

2016-04-06 15:02:45

by Matt Fleming

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Wed, 06 Apr, at 12:07:36PM, George Dunlap wrote:
>
> So rather than make a new entry point which does just the minimal
> amount of work to run on a software interface (Xen), you want to take
> an interface designed for hardware (EFI) and put in hacks so that it
> knows that sometimes some EFI services are not available? That sounds
> like it's going to make the EFI path just as unmanageable as the
> current PV path.

Requiring code in the new entry point to manipulate control registers
and do the switch to long-mode does not seem like a minimal amount of
code to me,

http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00134.html

What's likely to happen in the future is that startup_(32|64) will be
entered with different settings depending on whether coming from
HVMlite or bare metal, due to the natural tendency for these kinds of
code paths to diverge.

Sometimes EFI runtime services are not available on bare metal
hardware too, for example, when booting 32-bit kernels on 64-bit EFI
or 64-bit kernels on 32-bit EFI without CONFIG_EFI_MIXED. Or when
booting with the "noefi" kernel command line parameter. That's how
things work today when booting Xen, we disable the runtime services.

EFI boot services are a different story however, and the EFI boot stub
would need to be changed to handle that. Though honestly, it would
make more sense to provide EFI services stubs in the kernel image
itself that are implemented using hypercalls, and assuming you can run
hypercalls that early in boot.

One place that struck me as suitable for this "hypercall in an EFI
service stub" approach is the trouble with doing ACPI reboot as
documented here,

http://lists.xen.org/archives/html/xen-devel/2016-02/msg01609.html

Performing the reset hypercall from within HVMlite's custom EfiReset()
service would avoid having to touch ACPICA at all, and would be
indistinguishable from bare metal.

> Using the EFI entry point would certainly make sense if it was
> actually simpler than the proposed extra entry point. But it sounds
> like it's going to be more complicated, not only for Xen, but also for
> Linux.

Until someone sits down and writes the code I think we're going to be
arguing back and forth over this particular point.

2016-04-07 18:51:53

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Wed, Apr 06, 2016 at 12:07:36PM +0100, George Dunlap wrote:
> On Wed, Apr 6, 2016 at 3:40 AM, Luis R. Rodriguez <[email protected]> wrote:
> > A huge summary of the discussion over EFI boot option for HVMLite is now on a
> > wiki [2], below I'll just provide the outline of the discussion. Consider this a
> > request for more public review, feel free to take any of the items below and
> > elaborate on it as you see fit.
> [snip]
> > * Issues with boot x86 boot entries
> > * Small x86 zero page stubs
> [snip]
> > * Points against using EFI
> > * Nulling the claimed boot loader effect
>
> I'm a bit confused about this. You list exactly two arguments against
> the proposed stub in the "con" section:
> 1. Bootloaders may not be able to use the extra entry point
> 2. It's an extra entry point
>
> And then later, in another section, you actually list the reason #1 is
> irrelevant: bootloaders don't matter because the stub is there to boot
> from the Xen hypervisor.

Forgive me, the private thread was ongoing and I really wanted to capture
both sides of the expressed arguments and move to the list any extensions
to the discussion, this meant annotating both positions and letting
others fill in the gaps to determine if in fact one position was really
nullified by the other.

First I should state that it is only natural for anyone sensible to have
any type of knee-jerk reaction to kick and scream about the idea of
adding yet a new x86 entry point for Linux... IMO one should not expect
it to be sensible to simply accept yet-another-entry-point to Linux,
rather is should be the expected behaviour to have people really dig
and ensure they did their homework to ensure that if they are going to
add yet-another-entry-point they really validate and have exhausted
review of all possible avenues.

It was Andrew Coopers's position that boot loaders would not need to be
involved, and that would seem to nullify Matt's original position on this.

While Andrew's position is right in that perhaps only Xen tools have to deal
with the HVMLite specific entry, it would also still mean diverging from ARM's
own EFI entry only position, which I'd like to clarify that ARM has no custom
Xen entry, we should strive to match that. Anything far from that to me really
deserves an explanation, specially if we are going to argue that HVMLite is
the best that x86 Xen can do.

Ultimately unifying entry approaches for Xen in a streamlined fashion seems
like a sensible thing to strive for. Anything we push in the other direction,
as small as it can be, should deserve at least a 'hey, wait a minute'...

> So the only actual argument you have against the proposed PVH stub in
> the linked document is that it's an extra entry point.

Then you have not really read the document well, more to the point,
EFI's entry already does what the small HVMLite stub does, already
provides an existing entry and path to the kernel, so why should we
add yet another small stub?

So more to it, if the EFI entry already provides a way into Linux
in a more streamlined fashion bringing it closer to the bare metal
boot entry, why *would* we add another boot entry to x86, even if
its small and self contained ?

Another position against small stubs which I listed myself is that we may need
more semantics for early boot even if the new HVMLite small stub is added. This
remains to be seen. If we are going to add new semantics, it would seem best to
use something more standard like EFI configuration tables rather than hack on
to x86 further custom semantics. Custom sloppy semantics have proven to be
misused, and were ultimately a sloppy mess. To take this further,
virtualization semantics are being abused even outside of Xen -- drivers
developers may think that just because some semantics are available they can
use them to customize drivers to fine tune them for virtualized environments.
Even the best of our folks have taken positions to claim certain hacks are
*impossible* to change [0], when in fact only 4 days later a completely sensible
replacement was found [1], and this as even outside of Xen's situation, so its
not only Xen I am careful over here with regards to semantics. If we need early
boot code semantics or general kernel semantics for virtualization I want to
address that now and I want to be very careful with that given the abuse.
I'm doing my part to ensure that we clarify sloppy old semantics on Xen [2],
and this effort is actually proving to even pave the path for HVMLite, for
instance consider the gains of leveraging use of the legacy devices struct
in the future for ACPI_FADT_NO_VGA now, which HVMLite's specification seems
to annotate it will use. Clearing out the paravirt_enabled() hack for
pnpbios helped push for a right architectural solution to pave the path
for this in generic fashion.

[0] http://lkml.kernel.org/r/[email protected]
[1] https://www.spinics.net/lists/alsa-devel/msg48627.html
[2] http://lkml.kernel.org/r/[email protected]

>
> > * Why use EFI for HVMlite
> > * EFI calling conventions are standardized
> > * EFI entry generalizes what new HVMLite entry proposes
> > * Further semantics may be needed
> > * Match Xen ARM's clean solution
> > * You don't need full EFI emulation
> > * Minimal EFI stubs for guests
> > * GetMemoryMap()
> > * ExitBootServices()
> > * EFI stubs which may be needed for guests
> > * Exit()
> > * Variable operation functions
> > * EFI stubs not needed for guests
> > * GetTime()/SetTime()
> > * SetVirtualAddressMap()
> > * ResetSystem()
> > * dom0 EFI
> > * domU EFI emulation possibilities
> > * Xen implements its own EFI environment for guests
> > * Xen uses Tianocore / OVMF
>
> So rather than make a new entry point which does just the minimal
> amount of work to run on a software interface (Xen), you want to take
> an interface designed for hardware (EFI) and put in hacks so that it
> knows that sometimes some EFI services are not available?

The purpose of the discussion is to evaluate the EFI entry as a possible
alternative candidate to yet another entry point, from a completely engineering
neutral position.

> That sounds like it's going to make the EFI path just as unmanageable as the
> current PV path.

Can you describe how?

> Using the EFI entry point would certainly make sense if it was
> actually simpler than the proposed extra entry point. But it sounds
> like it's going to be more complicated, not only for Xen, but also for
> Linux.

How so? Please provide specifics.

Luis

2016-04-07 19:12:40

by Luis Chamberlain

[permalink] [raw]
Subject: Re: HVMLite / PVHv2 - using x86 EFI boot entry

On Wed, Apr 06, 2016 at 01:11:30PM +0200, Daniel Kiper wrote:
> On Wed, Apr 06, 2016 at 04:40:27AM +0200, Luis R. Rodriguez wrote:
> > Boris sent out the first HVMLite series of patches to add a new Xen guest type
> > February 1, 2016 [0]. We've been talking off list with a few folks now over
> > the prospect of instead of adding yet-another-boot-entry we instead fixate
> > HVMLite to use the x86 EFI boot entry. There's a series of reasons to consider
> > this, likewise there are reasons to question the effort required and if its
> > really needed. We'd like some more public review of this proposal, and see if
> > others can come up with other ideas, both in favor or against this proposal.
> >
> > This in particular is also a good time to get x86 Linux folks to chime on on
> > the general design proposal of HVMLite design, given that outside of the boot
> > entry discussion it would seem including myself that we didn't get the memo
> > over the proposed architecture review [1]. At least on my behalf perhaps the
> > only sticking thorns of the design was the new boot entry, which came to me
> > as a surprise, and this thread addresses and the lack of addressing semantics
> > for early boot (which we may seem to need to address; some of this is being
> > addressing in parallels through other work). The HVMLite document talks about
> > using ACPI_FADT_NO_VGA -- we don't use this yet upstream but I have some pending
> > changes which should make it easy to integrate its use on HVMLite. Perhaps
> > there are others that may have some other points they may want to raise now...
> >
> > A huge summary of the discussion over EFI boot option for HVMLite is now on a
> > wiki [2], below I'll just provide the outline of the discussion. Consider this a
> > request for more public review, feel free to take any of the items below and
> > elaborate on it as you see fit.
> >
> > Worth mentioning also is that this topic will be discussed at the 2016 Xen
> > Hackathon April 18-19 [3] at the ARM Cambridge, UK Headquarters so if you can
> > attend and this topic interests you, consider attending.
>
> I hope that you will be there as one of the biggest proponents of EFI entry point.

It would be a last minute trip to prepare for...

> If you does not it will be difficult or impossible to discuss this issue without you.
> In the worst case I can raise this topic on behalf of you and then we should organize
> phone call if possible (and accepted by others). However, to do that I must know your
> plans in advance.

I understand, I'd like to make it clear I am taking simply a neutral position
on this topic, even though it may seem I'm a die-hard on this idea, this was
simply an architectural question that came up, and I have been just
dissatisfied with the answers against the architectural questions I had over
this.

To help better evaluate how neutral really a discussion like this can be
can someone please help chime in on the question of if there are pressures to
just complete HVMLite design already ? How strong are those ? Are we really
able to have a very neutral technical discussion on this ?

Luis

2016-04-08 14:16:38

by George Dunlap

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On 07/04/16 19:51, Luis R. Rodriguez wrote:
> While Andrew's position is right in that perhaps only Xen tools have to deal
> with the HVMLite specific entry, it would also still mean diverging from ARM's
> own EFI entry only position, which I'd like to clarify that ARM has no custom
> Xen entry, we should strive to match that. Anything far from that to me really
> deserves an explanation, specially if we are going to argue that HVMLite is
> the best that x86 Xen can do.
>
> Ultimately unifying entry approaches for Xen in a streamlined fashion seems
> like a sensible thing to strive for. Anything we push in the other direction,
> as small as it can be, should deserve at least a 'hey, wait a minute'...

Quick factual correction here.

"Since ARM guests only use the EFI entry point, x86 guests should also
only use the EFI entry point" is certainly a reasonable argument to make.

However, dom0 on ARM does not use the EFI entry point. When starting
dom0, Xen uses the native entry point (the one that UBoot uses) and
hands dom0 a device-tree node. The reason this is possible on ARM is
that there are no assumptions made about what hardware is or is not
present on the system -- everything that needs to be communicated about
what is or is not present can be passed in DT.

So it is incorrect to say that ARM has an "EFI entry only" position.

(On ACPI systems, it does apparently generate some UEFI informational
tables, which it passes to the dom0 kernel via DT; and the kernel
unpacks and puts in the right place. Normal Xen ARM guests can use EFI,
but that's because we start OVMF in the guest context to provide the EFI
services. These may be where the idea that ARM guests use only the UEFI
entry point came from.)

Obviously it would be nice if we could use the native entry point on x86
as well, but there's decades of legacy hardware and backwards
compatibility to deal with there.

(Julien is a Xen ARM maintainer, he can correct me if I've said
something incorrect.)

-George

2016-04-08 20:40:42

by Luis Chamberlain

[permalink] [raw]
Subject: Re: HVMLite / PVHv2 - using x86 EFI boot entry

On Wed, Apr 06, 2016 at 10:40:08AM +0100, David Vrabel wrote:
> On 06/04/16 03:40, Luis R. Rodriguez wrote:
> >
> > * You don't need full EFI emulation
>
> I think needing any EFI emulation inside Xen (which is where it would
> need to be for dom0) is not suitable because of the increase in
> hypervisor ABI.

Is this because of timing on architecture / design of HVMLite, or
a general position that the complexity to deal with EFI emulation
is too much for Xen's taste ?

ARM already went the EFI entry way for domU -- it went the OVMF route,
would such a possibility be possible for x86 domU HVMLite ? If not why
not, I mean it would seem to make sense to at least mimic the same type
of early boot environment, and perhaps there are some lessons to be
learned from that effort too.

Are there some lessons to be learned with ARM's effort? What are they?
If that could be re-done again with any type of cleaner path, what
could that be that could help the x86 side ?

Although emulating EFI may require work, some folks have pointed out
that the amount of work may not be that much. If that is done can
we instead rely on the same code to replace OVMF to support both
Xen ARM and Xen HVMLite on x86 ? What would be the pros / cons of
this ?

> I also still do not understand your objection to the current tiny stub.

Its more of a hypothetical -- can an EFI entry be used instead given
it already does exactly what the new small entry does ? Its also rather
odd to add a new entry without evaluating fully a possible alternative
that would provide the same exact mechanism.

A full technical unbiased evaluation of the different approaches is what I'd
hope we could strive to achieve through discussion and peer review, thinking
and prioritizing ultimately what is best to minimize the impact on Linux
and also help take advantage of the best features possible through both
means. Thinking long term, not immediate short term.

Luis

2016-04-08 21:59:00

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Fri, Apr 08, 2016 at 03:16:14PM +0100, George Dunlap wrote:
> On 07/04/16 19:51, Luis R. Rodriguez wrote:
> > While Andrew's position is right in that perhaps only Xen tools have to deal
> > with the HVMLite specific entry, it would also still mean diverging from ARM's
> > own EFI entry only position, which I'd like to clarify that ARM has no custom
> > Xen entry, we should strive to match that. Anything far from that to me really
> > deserves an explanation, specially if we are going to argue that HVMLite is
> > the best that x86 Xen can do.
> >
> > Ultimately unifying entry approaches for Xen in a streamlined fashion seems
> > like a sensible thing to strive for. Anything we push in the other direction,
> > as small as it can be, should deserve at least a 'hey, wait a minute'...
>
> Quick factual correction here.
>
> "Since ARM guests only use the EFI entry point, x86 guests should also
> only use the EFI entry point" is certainly a reasonable argument to make.
>
> However, dom0 on ARM does not use the EFI entry point. When starting
> dom0, Xen uses the native entry point (the one that UBoot uses) and
> hands dom0 a device-tree node. The reason this is possible on ARM is
> that there are no assumptions made about what hardware is or is not
> present on the system -- everything that needs to be communicated about
> what is or is not present can be passed in DT.
>
> So it is incorrect to say that ARM has an "EFI entry only" position.
>
> (On ACPI systems, it does apparently generate some UEFI informational
> tables, which it passes to the dom0 kernel via DT; and the kernel
> unpacks and puts in the right place. Normal Xen ARM guests can use EFI,
> but that's because we start OVMF in the guest context to provide the EFI
> services. These may be where the idea that ARM guests use only the UEFI
> entry point came from.)
>
> Obviously it would be nice if we could use the native entry point on x86
> as well, but there's decades of legacy hardware and backwards
> compatibility to deal with there.

OK thanks for the clarification -- still no custom entries for Xen!
We should strive for that, at the very least.

You do have a point about the legacy stuff. There are two options there:

* Fold legacy support under HVMLite -- which seems to be what we
currently want to do (we should evaluate the implications and
requirements here for that); or

* Leave legacy stuff on the old PV path; this may be something to
bring to the table if we had in place a proactive solution to
avoid further fallout from the architecture of the huge differences
on the entries. The work I'm doing should help with that. (We should
also evaluate the implications and requirements here for that as
well).

Luis

2016-04-09 17:02:23

by Luis Chamberlain

[permalink] [raw]
Subject: Re: HVMLite / PVHv2 - using x86 EFI boot entry

On Wed, Apr 06, 2016 at 01:11:30PM +0200, Daniel Kiper wrote:
> On Wed, Apr 06, 2016 at 04:40:27AM +0200, Luis R. Rodriguez wrote:
> > Boris sent out the first HVMLite series of patches to add a new Xen guest type
> > February 1, 2016 [0]. We've been talking off list with a few folks now over
> > the prospect of instead of adding yet-another-boot-entry we instead fixate
> > HVMLite to use the x86 EFI boot entry. There's a series of reasons to consider
> > this, likewise there are reasons to question the effort required and if its
> > really needed. We'd like some more public review of this proposal, and see if
> > others can come up with other ideas, both in favor or against this proposal.
> >
> > This in particular is also a good time to get x86 Linux folks to chime on on
> > the general design proposal of HVMLite design, given that outside of the boot
> > entry discussion it would seem including myself that we didn't get the memo
> > over the proposed architecture review [1]. At least on my behalf perhaps the
> > only sticking thorns of the design was the new boot entry, which came to me
> > as a surprise, and this thread addresses and the lack of addressing semantics
> > for early boot (which we may seem to need to address; some of this is being
> > addressing in parallels through other work). The HVMLite document talks about
> > using ACPI_FADT_NO_VGA -- we don't use this yet upstream but I have some pending
> > changes which should make it easy to integrate its use on HVMLite. Perhaps
> > there are others that may have some other points they may want to raise now...
> >
> > A huge summary of the discussion over EFI boot option for HVMLite is now on a
> > wiki [2], below I'll just provide the outline of the discussion. Consider this a
> > request for more public review, feel free to take any of the items below and
> > elaborate on it as you see fit.
> >
> > Worth mentioning also is that this topic will be discussed at the 2016 Xen
> > Hackathon April 18-19 [3] at the ARM Cambridge, UK Headquarters so if you can
> > attend and this topic interests you, consider attending.
>
> I hope that you will be there as one of the biggest proponents of EFI entry point.
> If you does not it will be difficult or impossible to discuss this issue without you.
> In the worst case I can raise this topic on behalf of you and then we should organize
> phone call if possible (and accepted by others). However, to do that I must know your
> plans in advance.

I'll be there!

Luis

2016-04-11 05:12:14

by Jürgen Groß

[permalink] [raw]
Subject: Re: HVMLite / PVHv2 - using x86 EFI boot entry

On 08/04/16 22:40, Luis R. Rodriguez wrote:
> On Wed, Apr 06, 2016 at 10:40:08AM +0100, David Vrabel wrote:
>> On 06/04/16 03:40, Luis R. Rodriguez wrote:
>>>
>>> * You don't need full EFI emulation
>>
>> I think needing any EFI emulation inside Xen (which is where it would
>> need to be for dom0) is not suitable because of the increase in
>> hypervisor ABI.
>
> Is this because of timing on architecture / design of HVMLite, or
> a general position that the complexity to deal with EFI emulation
> is too much for Xen's taste ?

The Xen hypervisor should be as small as possible. Adding an EFI
emulator will be adding quite some code. This should be done after a
very thorough evaluation only.

> ARM already went the EFI entry way for domU -- it went the OVMF route,
> would such a possibility be possible for x86 domU HVMLite ? If not why
> not, I mean it would seem to make sense to at least mimic the same type
> of early boot environment, and perhaps there are some lessons to be
> learned from that effort too.

The final solution must be appropriate for dom0, too. So don't try
to limit the discussion to domU. If dom0 isn't going to be acceptable
there will no need to discuss domU.

> Are there some lessons to be learned with ARM's effort? What are they?
> If that could be re-done again with any type of cleaner path, what
> could that be that could help the x86 side ?
>
> Although emulating EFI may require work, some folks have pointed out
> that the amount of work may not be that much. If that is done can
> we instead rely on the same code to replace OVMF to support both
> Xen ARM and Xen HVMLite on x86 ? What would be the pros / cons of
> this ?
>
>> I also still do not understand your objection to the current tiny stub.
>
> Its more of a hypothetical -- can an EFI entry be used instead given
> it already does exactly what the new small entry does ? Its also rather
> odd to add a new entry without evaluating fully a possible alternative
> that would provide the same exact mechanism.

The interface isn't the new entry only. It should be evaluated how much
of the early EFI boot path would be common to the HVMlite one. What
would be gained by using the same entry but having two different boot
paths after it? You still need a way to distinguish between bare metal
EFI and HVMlite. And Xen needs a way to find out whether a kernel is
supporting HVMlite to boot it in the correct mode.

> A full technical unbiased evaluation of the different approaches is what I'd
> hope we could strive to achieve through discussion and peer review, thinking
> and prioritizing ultimately what is best to minimize the impact on Linux
> and also help take advantage of the best features possible through both
> means. Thinking long term, not immediate short term.

Sure.


Juergen

2016-04-12 21:03:15

by Andy Lutomirski

[permalink] [raw]
Subject: Re: HVMLite / PVHv2 - using x86 EFI boot entry

On Sun, Apr 10, 2016 at 10:12 PM, Juergen Gross <[email protected]> wrote:
> On 08/04/16 22:40, Luis R. Rodriguez wrote:
>> On Wed, Apr 06, 2016 at 10:40:08AM +0100, David Vrabel wrote:
>>> On 06/04/16 03:40, Luis R. Rodriguez wrote:
>>>>
>>>> * You don't need full EFI emulation
>>>
>>> I think needing any EFI emulation inside Xen (which is where it would
>>> need to be for dom0) is not suitable because of the increase in
>>> hypervisor ABI.
>>
>> Is this because of timing on architecture / design of HVMLite, or
>> a general position that the complexity to deal with EFI emulation
>> is too much for Xen's taste ?
>
> The Xen hypervisor should be as small as possible. Adding an EFI
> emulator will be adding quite some code. This should be done after a
> very thorough evaluation only.
>
>> ARM already went the EFI entry way for domU -- it went the OVMF route,
>> would such a possibility be possible for x86 domU HVMLite ? If not why
>> not, I mean it would seem to make sense to at least mimic the same type
>> of early boot environment, and perhaps there are some lessons to be
>> learned from that effort too.
>
> The final solution must be appropriate for dom0, too. So don't try
> to limit the discussion to domU. If dom0 isn't going to be acceptable
> there will no need to discuss domU.
>
>> Are there some lessons to be learned with ARM's effort? What are they?
>> If that could be re-done again with any type of cleaner path, what
>> could that be that could help the x86 side ?
>>
>> Although emulating EFI may require work, some folks have pointed out
>> that the amount of work may not be that much. If that is done can
>> we instead rely on the same code to replace OVMF to support both
>> Xen ARM and Xen HVMLite on x86 ? What would be the pros / cons of
>> this ?
>>
>>> I also still do not understand your objection to the current tiny stub.
>>
>> Its more of a hypothetical -- can an EFI entry be used instead given
>> it already does exactly what the new small entry does ? Its also rather
>> odd to add a new entry without evaluating fully a possible alternative
>> that would provide the same exact mechanism.
>
> The interface isn't the new entry only. It should be evaluated how much
> of the early EFI boot path would be common to the HVMlite one. What
> would be gained by using the same entry but having two different boot
> paths after it? You still need a way to distinguish between bare metal
> EFI and HVMlite. And Xen needs a way to find out whether a kernel is
> supporting HVMlite to boot it in the correct mode.
>
>> A full technical unbiased evaluation of the different approaches is what I'd
>> hope we could strive to achieve through discussion and peer review, thinking
>> and prioritizing ultimately what is best to minimize the impact on Linux
>> and also help take advantage of the best features possible through both
>> means. Thinking long term, not immediate short term.
>
> Sure.

FWIW, someone just pointed me to u-boot's EFI implementation.
u-boot's lib/efi_loader contains a tiny (<3k LOC, 10kB compiled) UEFI
implementation that's sufficient to boot a Linux EFI payload.

An argument against making Xen's default domU entry use UEFI is that
it might become unnecessarily awkward to do something like
chainloading to OVMF. But maybe OVMF can be compiled as a UEFI
binary :)

--Andy

2016-04-12 22:12:30

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Fri, Apr 08, 2016 at 11:58:54PM +0200, Luis R. Rodriguez wrote:
> On Fri, Apr 08, 2016 at 03:16:14PM +0100, George Dunlap wrote:
> > On 07/04/16 19:51, Luis R. Rodriguez wrote:
> > > While Andrew's position is right in that perhaps only Xen tools have to deal
> > > with the HVMLite specific entry, it would also still mean diverging from ARM's
> > > own EFI entry only position, which I'd like to clarify that ARM has no custom
> > > Xen entry, we should strive to match that. Anything far from that to me really
> > > deserves an explanation, specially if we are going to argue that HVMLite is
> > > the best that x86 Xen can do.
> > >
> > > Ultimately unifying entry approaches for Xen in a streamlined fashion seems
> > > like a sensible thing to strive for. Anything we push in the other direction,
> > > as small as it can be, should deserve at least a 'hey, wait a minute'...
> >
> > Quick factual correction here.
> >
> > "Since ARM guests only use the EFI entry point, x86 guests should also
> > only use the EFI entry point" is certainly a reasonable argument to make.
> >
> > However, dom0 on ARM does not use the EFI entry point. When starting
> > dom0, Xen uses the native entry point (the one that UBoot uses) and
> > hands dom0 a device-tree node. The reason this is possible on ARM is
> > that there are no assumptions made about what hardware is or is not
> > present on the system -- everything that needs to be communicated about
> > what is or is not present can be passed in DT.
> >
> > So it is incorrect to say that ARM has an "EFI entry only" position.
> >
> > (On ACPI systems, it does apparently generate some UEFI informational
> > tables, which it passes to the dom0 kernel via DT; and the kernel
> > unpacks and puts in the right place. Normal Xen ARM guests can use EFI,
> > but that's because we start OVMF in the guest context to provide the EFI
> > services. These may be where the idea that ARM guests use only the UEFI
> > entry point came from.)
> >
> > Obviously it would be nice if we could use the native entry point on x86
> > as well, but there's decades of legacy hardware and backwards
> > compatibility to deal with there.
>
> OK thanks for the clarification -- still no custom entries for Xen!
> We should strive for that, at the very least.
>
> You do have a point about the legacy stuff. There are two options there:
>
> * Fold legacy support under HVMLite -- which seems to be what we
> currently want to do (we should evaluate the implications and
> requirements here for that); or
>
> * Leave legacy stuff on the old PV path; this may be something to
> bring to the table if we had in place a proactive solution to
> avoid further fallout from the architecture of the huge differences
> on the entries. The work I'm doing should help with that. (We should
> also evaluate the implications and requirements here for that as
> well).

Also, x86 does have a history of short DT use. Just pointing that its there as
an option as well. I'll Cc you on some thread about that.

Luis

2016-04-13 09:02:12

by Roger Pau Monne

[permalink] [raw]
Subject: Re: HVMLite / PVHv2 - using x86 EFI boot entry

On Tue, Apr 12, 2016 at 02:02:52PM -0700, Andy Lutomirski wrote:
> On Sun, Apr 10, 2016 at 10:12 PM, Juergen Gross <[email protected]> wrote:
> > On 08/04/16 22:40, Luis R. Rodriguez wrote:
> >> On Wed, Apr 06, 2016 at 10:40:08AM +0100, David Vrabel wrote:
> >>> On 06/04/16 03:40, Luis R. Rodriguez wrote:
> >>>>
> >>>> * You don't need full EFI emulation
> >>>
> >>> I think needing any EFI emulation inside Xen (which is where it would
> >>> need to be for dom0) is not suitable because of the increase in
> >>> hypervisor ABI.
> >>
> >> Is this because of timing on architecture / design of HVMLite, or
> >> a general position that the complexity to deal with EFI emulation
> >> is too much for Xen's taste ?
> >
> > The Xen hypervisor should be as small as possible. Adding an EFI
> > emulator will be adding quite some code. This should be done after a
> > very thorough evaluation only.
> >
> >> ARM already went the EFI entry way for domU -- it went the OVMF route,
> >> would such a possibility be possible for x86 domU HVMLite ? If not why
> >> not, I mean it would seem to make sense to at least mimic the same type
> >> of early boot environment, and perhaps there are some lessons to be
> >> learned from that effort too.
> >
> > The final solution must be appropriate for dom0, too. So don't try
> > to limit the discussion to domU. If dom0 isn't going to be acceptable
> > there will no need to discuss domU.
> >
> >> Are there some lessons to be learned with ARM's effort? What are they?
> >> If that could be re-done again with any type of cleaner path, what
> >> could that be that could help the x86 side ?
> >>
> >> Although emulating EFI may require work, some folks have pointed out
> >> that the amount of work may not be that much. If that is done can
> >> we instead rely on the same code to replace OVMF to support both
> >> Xen ARM and Xen HVMLite on x86 ? What would be the pros / cons of
> >> this ?
> >>
> >>> I also still do not understand your objection to the current tiny stub.
> >>
> >> Its more of a hypothetical -- can an EFI entry be used instead given
> >> it already does exactly what the new small entry does ? Its also rather
> >> odd to add a new entry without evaluating fully a possible alternative
> >> that would provide the same exact mechanism.
> >
> > The interface isn't the new entry only. It should be evaluated how much
> > of the early EFI boot path would be common to the HVMlite one. What
> > would be gained by using the same entry but having two different boot
> > paths after it? You still need a way to distinguish between bare metal
> > EFI and HVMlite. And Xen needs a way to find out whether a kernel is
> > supporting HVMlite to boot it in the correct mode.
> >
> >> A full technical unbiased evaluation of the different approaches is what I'd
> >> hope we could strive to achieve through discussion and peer review, thinking
> >> and prioritizing ultimately what is best to minimize the impact on Linux
> >> and also help take advantage of the best features possible through both
> >> means. Thinking long term, not immediate short term.
> >
> > Sure.
>
> FWIW, someone just pointed me to u-boot's EFI implementation.
> u-boot's lib/efi_loader contains a tiny (<3k LOC, 10kB compiled) UEFI
> implementation that's sufficient to boot a Linux EFI payload.

I guess this is a pretty minimal EFI implementation, is this something
standard, or just an EFI implementation tailored to Linux needs? (ie: is
there any standard EFI flag to signal this kind of minimal EFI environment?)

> An argument against making Xen's default domU entry use UEFI is that
> it might become unnecessarily awkward to do something like
> chainloading to OVMF. But maybe OVMF can be compiled as a UEFI
> binary :)

With my FreeBSD committer hat:

The FreeBSD kernel doesn't contain an EFI entry point, it just contains one
single entry point that's used for both legacy BIOS and EFI. Then the
FreeBSD loader is the one that contains the different entry points. I would
really like to avoid adding an EFI entry point and the PE header to the
FreeBSD kernel. The current trampoline in FreeBSD to tie the Xen entry point
into the native path contains 96 lines of assembly (half of them are
actually comments) and 66 lines of C. I think adding an EFI entry point is
going to add a lot more of code than this, and we would probably need
changes to the build system in order to assembly the PE header and the ELF
headers together.

IMHO, if we want to boot PVH using EFI the right solution is to use OVMF (or
any other UEFI firmware) and port it so it's able to run as a PVH guest. I
guess it should even be possible to use it for Dom0, although I think this
is cumbersome.

Roger.

2016-04-13 09:54:42

by Roger Pau Monne

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Fri, Apr 08, 2016 at 11:58:54PM +0200, Luis R. Rodriguez wrote:
> On Fri, Apr 08, 2016 at 03:16:14PM +0100, George Dunlap wrote:
> > On 07/04/16 19:51, Luis R. Rodriguez wrote:
> > > While Andrew's position is right in that perhaps only Xen tools have to deal
> > > with the HVMLite specific entry, it would also still mean diverging from ARM's
> > > own EFI entry only position, which I'd like to clarify that ARM has no custom
> > > Xen entry, we should strive to match that. Anything far from that to me really
> > > deserves an explanation, specially if we are going to argue that HVMLite is
> > > the best that x86 Xen can do.
> > >
> > > Ultimately unifying entry approaches for Xen in a streamlined fashion seems
> > > like a sensible thing to strive for. Anything we push in the other direction,
> > > as small as it can be, should deserve at least a 'hey, wait a minute'...
> >
> > Quick factual correction here.
> >
> > "Since ARM guests only use the EFI entry point, x86 guests should also
> > only use the EFI entry point" is certainly a reasonable argument to make.
> >
> > However, dom0 on ARM does not use the EFI entry point. When starting
> > dom0, Xen uses the native entry point (the one that UBoot uses) and
> > hands dom0 a device-tree node. The reason this is possible on ARM is
> > that there are no assumptions made about what hardware is or is not
> > present on the system -- everything that needs to be communicated about
> > what is or is not present can be passed in DT.
> >
> > So it is incorrect to say that ARM has an "EFI entry only" position.
> >
> > (On ACPI systems, it does apparently generate some UEFI informational
> > tables, which it passes to the dom0 kernel via DT; and the kernel
> > unpacks and puts in the right place. Normal Xen ARM guests can use EFI,
> > but that's because we start OVMF in the guest context to provide the EFI
> > services. These may be where the idea that ARM guests use only the UEFI
> > entry point came from.)
> >
> > Obviously it would be nice if we could use the native entry point on x86
> > as well, but there's decades of legacy hardware and backwards
> > compatibility to deal with there.
>
> OK thanks for the clarification -- still no custom entries for Xen!
> We should strive for that, at the very least.
>
> You do have a point about the legacy stuff. There are two options there:
>
> * Fold legacy support under HVMLite -- which seems to be what we
> currently want to do (we should evaluate the implications and
> requirements here for that); or

I'm not following here. What does it mean to fold legacy support under
HVMlite? HVMlite doesn't have any legacy hardware, and that's the issue when
it comes to using native Linux entry points. Linux might expect some legacy
PC hardware to be always present, which is not true for HVMlite.

Could you please clarify this point?

> * Leave legacy stuff on the old PV path; this may be something to
> bring to the table if we had in place a proactive solution to
> avoid further fallout from the architecture of the huge differences
> on the entries. The work I'm doing should help with that. (We should
> also evaluate the implications and requirements here for that as
> well).

Classic PV guests don't have legacy hardware at all, they just have PV
interfaces, so I'm even less sure of what this means.

Roger.

2016-04-13 10:04:27

by Roger Pau Monne

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Wed, Apr 06, 2016 at 04:02:40PM +0100, Matt Fleming wrote:
[...]
> One place that struck me as suitable for this "hypercall in an EFI
> service stub" approach is the trouble with doing ACPI reboot as
> documented here,
>
> http://lists.xen.org/archives/html/xen-devel/2016-02/msg01609.html
>
> Performing the reset hypercall from within HVMlite's custom EfiReset()
> service would avoid having to touch ACPICA at all, and would be
> indistinguishable from bare metal.

I don't get this, the "reset/shutdown" hypercall requires the following
steps from Dom0 (it's not as simple as calling a hypercall):

The way to perform a full system power off from Dom0 is different than
what's done in a DomU guest. In order to perform a power off from Dom0 the
native ACPI path should be followed, but the guest should not write the
`SLP_EN` bit to the Pm1Control register. Instead the
`XENPF_enter_acpi_sleep` hypercall should be used, filling the following
data in the `xen_platform_op` struct:

cmd = XENPF_enter_acpi_sleep
interface_version = XENPF_INTERFACE_VERSION
u.enter_acpi_sleep.pm1a_cnt_val = Pm1aControlValue
u.enter_acpi_sleep.pm1b_cnt_val = Pm1bControlValue

At which point it means that we are either going to duplicate ACPICA code
into the HVMlite's custom EfiReset() service, or we are going to call into
ACPICA, which is what we already do now.

Roger.

2016-04-13 10:05:24

by George Dunlap

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Tue, Apr 12, 2016 at 11:12 PM, Luis R. Rodriguez <[email protected]> wrote:
> Also, x86 does have a history of short DT use. Just pointing that its there as
> an option as well. I'll Cc you on some thread about that.

I'm not sure how this is relevant to anything.

What we're talking about is how to get from Xen to a point in the
Linux kernel where everything can Just Work. The proposed feature is
a mini trampoline that (as I understand it):
1. Tells Xen where to jump to (via ELF note)
2. Sets up some basic modes and pagetables and then jumps to the zero
page so Linux can just carry on.

-George

2016-04-13 10:15:20

by Matt Fleming

[permalink] [raw]
Subject: Re: HVMLite / PVHv2 - using x86 EFI boot entry

On Wed, 13 Apr, at 11:02:02AM, Roger Pau Monn? wrote:
>
> With my FreeBSD committer hat:
>
> The FreeBSD kernel doesn't contain an EFI entry point, it just contains one
> single entry point that's used for both legacy BIOS and EFI. Then the
> FreeBSD loader is the one that contains the different entry points. I would
> really like to avoid adding an EFI entry point and the PE header to the
> FreeBSD kernel. The current trampoline in FreeBSD to tie the Xen entry point
> into the native path contains 96 lines of assembly (half of them are
> actually comments) and 66 lines of C. I think adding an EFI entry point is
> going to add a lot more of code than this, and we would probably need
> changes to the build system in order to assembly the PE header and the ELF
> headers together.

What does the boot flow look like for PVH2 on FreeBSD today?
Presumably it doesn't have the same entry point that Boris proposed
for Linux?

Does it go, Hypervisor -> FreeBSD loader -> FreeBSD kernel? Or are you
able to directly boot the kernel from the hypervisor and skip the
middle part by having secondary entry point for Xen marked by the ELF
note?

> IMHO, if we want to boot PVH using EFI the right solution is to use OVMF (or
> any other UEFI firmware) and port it so it's able to run as a PVH guest. I
> guess it should even be possible to use it for Dom0, although I think this
> is cumbersome.

There are two levels of EFI boot entry features being discussed,

1. Make the OS kernel a PE/COFF executable
2. Provide some level of EFI service functionality

You can adopt 1. without 2, i.e. without actually providing any EFI
services at all, as long as the Xen hypervisor grows a PE/COFF loader
(since EFI firmware has to provide you one, for EFI platforms you
could use the LoadImage() service in the firmware, but for BIOS
platforms you'd need your own in Xen).

On Linux, this has the advantage of deferring the decompression of the
bzImage (x86 Linux kernel file format) to the stub on the front of the
bzImage. And while I realise that the toolstack already has support
for decompressing bzImages, given what Andrew has said about reducing
attack surface, having the guest perform the decompression should be a
win.

Of course, this is offset somewhat by the fact that you need to audit
the PE/COFF loader ;) But decompression in general is notoriously
vulnerable to security issues.

Using the in-kernel decompressor is how most (all?) Linux boot loaders
work today, so there's the added benefit of reducing the differences
between booting on Xen and booting bare metal. For example, you'd
probably be able to use CONFIG_RANDOMIZE_BASE (ASLR for kernel image)
for Xen if you use the kernel's decompressor. Xen would also get
future features in this area for free, and there is a tendency to push
boot features into the early stub.

For 1. we'd basically be using the PE/COFF file format with the EFI
ABI as an OS agnostic boot protocol, but not as a full firmware
runtime environment.

2. is also interesting, though I think less so than 1. I agree that
making OVMF work as a PVH guest is probably the right way to go, even
for Dom0, not least because you'd have a much cleaner/less buggy
implementation than what we see in the real world ;)

2016-04-13 10:22:01

by Matt Fleming

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Wed, 13 Apr, at 12:03:12PM, Roger Pau Monn? wrote:
>
> I don't get this, the "reset/shutdown" hypercall requires the following
> steps from Dom0 (it's not as simple as calling a hypercall):
>
> The way to perform a full system power off from Dom0 is different than
> what's done in a DomU guest. In order to perform a power off from Dom0 the
> native ACPI path should be followed, but the guest should not write the
> `SLP_EN` bit to the Pm1Control register. Instead the
> `XENPF_enter_acpi_sleep` hypercall should be used, filling the following
> data in the `xen_platform_op` struct:
>
> cmd = XENPF_enter_acpi_sleep
> interface_version = XENPF_INTERFACE_VERSION
> u.enter_acpi_sleep.pm1a_cnt_val = Pm1aControlValue
> u.enter_acpi_sleep.pm1b_cnt_val = Pm1bControlValue
>
> At which point it means that we are either going to duplicate ACPICA code
> into the HVMlite's custom EfiReset() service, or we are going to call into
> ACPICA, which is what we already do now.

Fair enough, I wasn't aware that you needed to call into ACPI to
perform the reset.

2016-04-13 10:25:15

by Roger Pau Monne

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Wed, Apr 13, 2016 at 12:12:25AM +0200, Luis R. Rodriguez wrote:
[...]
> Also, x86 does have a history of short DT use. Just pointing that its there as
> an option as well. I'll Cc you on some thread about that.

I don't see how this is relevant to the conversation that's going on:

How many x86 hardware provide DT? I bet this is 0%.

How many OSes can boot on x86 using DT? Linux maybe, certainly FreeBSD,
Windows or OpenBSD won't be able to boot at all when provided a DT on x86.

Is Xen going to craft a DT for x86 based on ACPI? No, because it can't parse
the DSDT or other dynamic tables that contain the information about
the devices in the system.

I would also like to point out that DT or not DT is not really the problem
here, the issue that George was trying to point out is that on x86 there's
some legacy hardware that's considered to be always there, so it's presence
is not signaled by ACPI, and HVMlite is _not_ emulating this hardware. It
doesn't matter if the hardware description comes from ACPI or DT, this
hardware is considered to be always present on PC compatible hardware.

Roger.

2016-04-13 10:40:08

by Matt Fleming

[permalink] [raw]
Subject: Re: HVMLite / PVHv2 - using x86 EFI boot entry

On Wed, 13 Apr, at 11:15:15AM, Matt Fleming wrote:
>
> For 1. we'd basically be using the PE/COFF file format with the EFI
> ABI as an OS agnostic boot protocol, but not as a full firmware
> runtime environment.

To add some balance to this proposal (since there's no such thing as a
free lunch) some of the disadvantages are,

The PE/COFF stub in Linux does assume that it is executing in native
cpu mode and does not perform any mode switching, i.e. from 32-bit
protected to long mode. This is due to the way that EFI works - by the
time the OS image entry point is jumped to on a 64-bit cpu we're
running in long mode with identity mapped page tables. To be fair,
when running Xen on EFI (bare metal) this would save you one cpu mode
switch when compared with the current HVMLite proposal.

I'm not aware of a direct equivalent for ELF notes in the PE/COFF
format. I'm still re-reading the spec to find something suitable.

2016-04-13 11:12:48

by George Dunlap

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Wed, Apr 13, 2016 at 11:15 AM, Matt Fleming <[email protected]> wrote:
> For 1. we'd basically be using the PE/COFF file format with the EFI
> ABI as an OS agnostic boot protocol, but not as a full firmware
> runtime environment.

But we still have the issue here that the now the EFI entry point in
Linux has to figure out, "Am I running in a full firmware runtime
environment, or am I running under Xen?", and then change behavior
appropriately. Then we get back to Juergen's comment: "[The EFI
proposal] should be evaluated how much of the early EFI boot path
would be common to the HVMlite one. What would be gained by using the
same entry but having two different boot paths after it?"

> 2. is also interesting, though I think less so than 1. I agree that
> making OVMF work as a PVH guest is probably the right way to go, even
> for Dom0, not least because you'd have a much cleaner/less buggy
> implementation than what we see in the real world ;)

So rather than just add an extra entry point and a Xen-to-zero-page
stub, you're going to ask Xen on dom0 to import a full OVMF binary?
Or have the bootloader entries include xen, linux, the initrd, *and*
ovmf? That seems a bit extreme. :-)

Keep in mind also that PVH needs to support not only the traditional
VM use-case (e.g., booting a full distro), but the small service VM
usecase (a la unikernels). Booting a traditional distro as a domU via
OVMF -> EFI Linux makes sense; it reduces the distro's test burden,
and the OVMF doesn't add a lot to the memory or boot time compared to
the size and boot time of a full distro. But booting tiny service
VMs, sometimes with not even any disk of their own (other than a
ramdisk), the extra cost of including OVMF in the guest address space
can be a non-negligible addition to the memory requirements and
boot-up time.

One of the reasons Xen on ARM prioritized getting EFI working for
domUs was that a representative from a certain distro vendor made it
absolutely clear that *their* distro would *only* support booting via
EFI on ARM. But you can still, as I understand it, use uBoot with DT
to boot a lightweight domU if you want.

-George

2016-04-13 11:59:45

by Roger Pau Monne

[permalink] [raw]
Subject: Re: HVMLite / PVHv2 - using x86 EFI boot entry

On Wed, Apr 13, 2016 at 11:15:15AM +0100, Matt Fleming wrote:
> On Wed, 13 Apr, at 11:02:02AM, Roger Pau Monn? wrote:
> >
> > With my FreeBSD committer hat:
> >
> > The FreeBSD kernel doesn't contain an EFI entry point, it just contains one
> > single entry point that's used for both legacy BIOS and EFI. Then the
> > FreeBSD loader is the one that contains the different entry points. I would
> > really like to avoid adding an EFI entry point and the PE header to the
> > FreeBSD kernel. The current trampoline in FreeBSD to tie the Xen entry point
> > into the native path contains 96 lines of assembly (half of them are
> > actually comments) and 66 lines of C. I think adding an EFI entry point is
> > going to add a lot more of code than this, and we would probably need
> > changes to the build system in order to assembly the PE header and the ELF
> > headers together.
>
> What does the boot flow look like for PVH2 on FreeBSD today?
> Presumably it doesn't have the same entry point that Boris proposed
> for Linux?

Yes it does have something quite similar to the entry point that Boris
proposed for Linux.

> Does it go, Hypervisor -> FreeBSD loader -> FreeBSD kernel? Or are you
> able to directly boot the kernel from the hypervisor and skip the
> middle part by having secondary entry point for Xen marked by the ELF
> note?

We skip the bootloader and Xen loads the FreeBSD kernel directly using the
ELF note that contains the PVH entry point.

I certainly want to be able to run the FreeBSD loader inside of a PVH guest,
but I plan to simply chainload it from OVMF, so it would look like:

Hypervisor -> OVMF -> FreeBSD EFI loader -> FreeBSD kernel

> > IMHO, if we want to boot PVH using EFI the right solution is to use OVMF (or
> > any other UEFI firmware) and port it so it's able to run as a PVH guest. I
> > guess it should even be possible to use it for Dom0, although I think this
> > is cumbersome.
>
> There are two levels of EFI boot entry features being discussed,
>
> 1. Make the OS kernel a PE/COFF executable
> 2. Provide some level of EFI service functionality
>
> You can adopt 1. without 2, i.e. without actually providing any EFI
> services at all, as long as the Xen hypervisor grows a PE/COFF loader
> (since EFI firmware has to provide you one, for EFI platforms you
> could use the LoadImage() service in the firmware, but for BIOS
> platforms you'd need your own in Xen).

We could use native LoadImage for Dom0 maybe if we are booted on an EFI
platform, but for DomUs we certainly need to implement our own inside of
Xen, at which point we could do the same and always use the one inside of
Xen in order to avoid diverging paths.

TBH, I don't think this is the right solution. We would force every OS
kernel that wants to be loaded using Xen to become a PE/COFF executable.
This also includes unikernels like MirageOS, which will be forced to become
a PE/COFF executable.

Is this header compatible with the ELF header? Con both co-exist in the
same binary without issues?

> On Linux, this has the advantage of deferring the decompression of the
> bzImage (x86 Linux kernel file format) to the stub on the front of the
> bzImage. And while I realise that the toolstack already has support
> for decompressing bzImages, given what Andrew has said about reducing
> attack surface, having the guest perform the decompression should be a
> win.
>
> Of course, this is offset somewhat by the fact that you need to audit
> the PE/COFF loader ;) But decompression in general is notoriously
> vulnerable to security issues.
>
> Using the in-kernel decompressor is how most (all?) Linux boot loaders
> work today, so there's the added benefit of reducing the differences
> between booting on Xen and booting bare metal. For example, you'd
> probably be able to use CONFIG_RANDOMIZE_BASE (ASLR for kernel image)
> for Xen if you use the kernel's decompressor. Xen would also get
> future features in this area for free, and there is a tendency to push
> boot features into the early stub.

All the issues that you mention above are also solved by chainloading OVMF
instead of directly loading the guest kernel, and it avoids adding a PE/COFF
loader into Xen.

> For 1. we'd basically be using the PE/COFF file format with the EFI
> ABI as an OS agnostic boot protocol, but not as a full firmware
> runtime environment.

This also means that we will be adding PE/COFF headers to (uni)kernels, but
we won't still implement full EFI support inside of them, so although it
would seem like they are capable of being loaded by a native EFI loader,
they would not.

This seems misleading, and I think it's going to cause grief amongst OS
developers in general. The current proposed entry point is unique to Xen
(it's only mentioned in Xen ELF notes), and is certainly not going to cause
confusion at all.

Also, doesn't this (the fact that Xen will use the EFI entry point
without a runtime environment) mean that there are going to be diverging
paths inside of Linux EFI entry point anyway?

At which point, does it really matter that much if this divergence includes
a new entry point or not?

> 2. is also interesting, though I think less so than 1. I agree that
> making OVMF work as a PVH guest is probably the right way to go, even
> for Dom0, not least because you'd have a much cleaner/less buggy
> implementation than what we see in the real world ;)

I think we all agree that this is not suitable.

Roger.

2016-04-13 15:50:10

by George Dunlap

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Thu, Apr 7, 2016 at 7:51 PM, Luis R. Rodriguez <[email protected]> wrote:
> So more to it, if the EFI entry already provides a way into Linux
> in a more streamlined fashion bringing it closer to the bare metal
> boot entry, why *would* we add another boot entry to x86, even if
> its small and self contained ?

We would avoid using EFI if:

* Being called both on real hardware and under Xen would make the EFI
entry point more complicated

* Adding the necessary EFI support into Xen would be a significant
chunk of extra work

* Requiring PVH mode to implement EFI would make it more difficult for
other kernes (NetBSD, FreeBSD) to act as dom0s.

* Requiring PVH mode to use EFI would make it more difficult to
support unikernel-style workloads for domUs.

Now as has been pointed out, we don't know for a lot of the above
things for certain, because nobody has posted any code. None of us
really want to post any code because:

* Reading and understanding the EFI spec, the Linux EFI path, and
implementing all that on both the Xen and the Linux side is a lot of
work

* It looks pretty likely that many of the above things will be true

* The only real objection to the currently proposed solution is really weak.

If you want to post some code I'm sure we could give you feedback on it.

> Another position against small stubs which I listed myself is that we may need
> more semantics for early boot even if the new HVMLite small stub is added. This
> remains to be seen. If we are going to add new semantics, it would seem best to
> use something more standard like EFI configuration tables rather than hack on
> to x86 further custom semantics. Custom sloppy semantics have proven to be
> misused, and were ultimately a sloppy mess.
[snip]
>> That sounds like it's going to make the EFI path just as unmanageable as the
>> current PV path.
>
> Can you describe how?
>
>> Using the EFI entry point would certainly make sense if it was
>> actually simpler than the proposed extra entry point. But it sounds
>> like it's going to be more complicated, not only for Xen, but also for
>> Linux.
>
> How so? Please provide specifics.

Here is the juxtaposition that confuses me. The problem with a lot of
the current code is that you have virtualization-specific hacks all
over the place making things complicated. And in the first quote
above, you seem afraid that the extra entry point with stub code will
somehow be misused and end up in a similar "sloppy mess", even though
it's not at all clear how *having a stub entry point* could be
"abused" by anyone. But then when I suggest that sharing a codepath
between systems that have actual EFI firmware, with platform hardware,
and a system that has no EFI firmware and no similar concept of the
hardware, might end up a sloppy mess of Xen-specific if clauses and
maintenance headaches due to broken assumptions, it doesn't even
register with you as a reasonable concern?

As Matt said, nobody will be able to provide specifics until someone
tries to code it up. But coding things up is not free.

-George

2016-04-13 18:29:56

by Luis Chamberlain

[permalink] [raw]
Subject: Re: HVMLite / PVHv2 - using x86 EFI boot entry

On Mon, Apr 11, 2016 at 07:12:08AM +0200, Juergen Gross wrote:
> On 08/04/16 22:40, Luis R. Rodriguez wrote:
> > On Wed, Apr 06, 2016 at 10:40:08AM +0100, David Vrabel wrote:
> >> On 06/04/16 03:40, Luis R. Rodriguez wrote:
> >>>
> >>> * You don't need full EFI emulation
> >>
> >> I think needing any EFI emulation inside Xen (which is where it would
> >> need to be for dom0) is not suitable because of the increase in
> >> hypervisor ABI.
> >
> > Is this because of timing on architecture / design of HVMLite, or
> > a general position that the complexity to deal with EFI emulation
> > is too much for Xen's taste ?
>
> The Xen hypervisor should be as small as possible. Adding an EFI
> emulator will be adding quite some code. This should be done after a
> very thorough evaluation only.

Sure.

> > ARM already went the EFI entry way for domU -- it went the OVMF route,
> > would such a possibility be possible for x86 domU HVMLite ? If not why
> > not, I mean it would seem to make sense to at least mimic the same type
> > of early boot environment, and perhaps there are some lessons to be
> > learned from that effort too.
>
> The final solution must be appropriate for dom0, too. So don't try
> to limit the discussion to domU. If dom0 isn't going to be acceptable
> there will no need to discuss domU.

Understood. George noted that on ARM dom0 still uses the ARM native entry
point, it seems to accomplish this as it uses a device tree node. I'll
chime in on that in another thread.

> > Are there some lessons to be learned with ARM's effort? What are they?
> > If that could be re-done again with any type of cleaner path, what
> > could that be that could help the x86 side ?
> >
> > Although emulating EFI may require work, some folks have pointed out
> > that the amount of work may not be that much. If that is done can
> > we instead rely on the same code to replace OVMF to support both
> > Xen ARM and Xen HVMLite on x86 ? What would be the pros / cons of
> > this ?
> >
> >> I also still do not understand your objection to the current tiny stub.
> >
> > Its more of a hypothetical -- can an EFI entry be used instead given
> > it already does exactly what the new small entry does ? Its also rather
> > odd to add a new entry without evaluating fully a possible alternative
> > that would provide the same exact mechanism.
>
> The interface isn't the new entry only. It should be evaluated how much
> of the early EFI boot path would be common to the HVMlite one.

We also have other asm code which can be shared. I'll reply to Boris'
original e-mail with what I can identify as perhaps sharable. There is
obviously more as you allude.

> What would be gained by using the same entry but having two different boot
> paths after it?

Its a good question. In summary for me it would be the push for sharing more
code and the push for semantics on early boot to address differences
proactively, and ultimately it may enable us to help bring closer the old PV
boot path closer.

I'll elaborate on this but first let's clarify why a new entry is used for
HVMlite to start of with:

1) Xen ABI has historically not wanted to set up the boot params for Linux
guests, instead it insists on letting the Linux kernel Xen boot stubs fill
that out for it. This sticking point means it has implicated a boot stub.
The HVMLite boot entry tries to bring the boot entries paths closer as it
leverages more of the HVM boot path philosophy to mimic the regular PC boot
path.

Is HVMLite supposed to support legacy PV guests as well BTW ?

Reason I'm highlighting Xen ABI as a *reason* alone is that even with
today's large discrepancy on the old PV boot path I believe we can
bring together the boot paths closer together if the Xen ABI was slightly
flexible about this, I've highlighted how I believe that is possible before,
*iff* the Xen ABI would at the very least set 2 things only:

a) Hypervisor type
b) A custom data pointer

This would enable a single boot entry on the guest to handle then:

Pseudo code:

startup_32() startup_64()
| |
| |
V V
pre_hypervisor_stub_32() pre_hypervisor_stub_64()
| |
| |
V V
[existing startup_32()] [existing startup_64()]
| |
| |
V V
post_hypervisor_stub_32() post_hypervisor_stub_64()


If the Xen ABI was flexible about setting a hypervisor type and custom
data pointer then we would haven handlers for it, and in it, it can
do whatever it thinks is needed for its own guest types. It could
also continue to set the zero page on its own as it sees fit.

Again, note that if this is done it could also mean even bringing together
the old PV boot path closer together... so this is not just a prospect
for HVMLite but also for old PV guests.

2) Because of 1) it has meant we have no formal semantics for early boot
code is available and so severe differences can best be addressed also
by yet another boot entry. This has meant often times not addressing
or not knowing if we've addressed real differences between the different
entries. Case in point, dead code [0]. How do we know we will not run
certain code that should not run for the different entries ? Without
*any* semantics later in boot code to distinguish where we came from
and because we strive to build single kernels with different possible
run time environments it means we have tons of code available to
execute / run that we may not need.

Because of the lack of semantics we may still have dead code prospects
with the new HVMLite entry. How are we sure there is no differences ?

[0] http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html

3) Unikernel / other OS requirements: this is really tied to 2) but even if
we tried to evolve the Xen ABI it would mean considering existing solutions
out there. Things to consider as an example: FreeBSD doesn't have an EFI
entry, unikernels want a simple boot entry.

With this in mind then, that I can think of:

Cons of using the same entry but having two different boot paths:

* Pushes the Xen ABI, needs to make everyone happy, this is hard
* Perhaps harder to implement

Gains of striving to use the same entry but having two different boot:

* Helps to share more code easily
* Reduce attack surface
* Requires us to have semantics for early boot; this has a series of
side benefits:
- Means you should try to address differences explicitly rather than
implicitly -- case in point Dead Code

> You still need a way to distinguish between bare metal
> EFI and HVMlite.

Great point! This is the semantics aspect. The new entry for HVMlite approach
deals with this by making the differences implicit by the new entry point.
My call for addressing this through a hypervisor type was to see if we can
get those semantics added explicitly so we can also later address dead
code concerns for the new HVMLite guest type.

Part of my own interest in an EFI entry here is that EFI could be used to help
expand on the semantics in an OS/agnostic form rather than pushing the x86 boot
protocol further. That seems to have its own set of drawbacks though.


> And Xen needs a way to find out whether a kernel is
> supporting HVMlite to boot it in the correct mode.

How was Xen going to find out if new kernels had HVMlite support with the
new entry ? An ELFNOTE() ? If an entry is shared could we note use an
ELFNOTE() also for this though too ?

Luis

2016-04-13 18:50:14

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Wed, Apr 13, 2016 at 11:54:29AM +0200, Roger Pau Monn? wrote:
> On Fri, Apr 08, 2016 at 11:58:54PM +0200, Luis R. Rodriguez wrote:
> > OK thanks for the clarification -- still no custom entries for Xen!
> > We should strive for that, at the very least.
> >
> > You do have a point about the legacy stuff. There are two options there:
> >
> > * Fold legacy support under HVMLite -- which seems to be what we
> > currently want to do (we should evaluate the implications and
> > requirements here for that); or
>
> I'm not following here. What does it mean to fold legacy support under
> HVMlite? HVMlite doesn't have any legacy hardware, and that's the issue when
> it comes to using native Linux entry points. Linux might expect some legacy
> PC hardware to be always present, which is not true for HVMlite.
>
> Could you please clarify this point?

It seems there is a confusion on terms used. By folding legacy support under
HVMLite I meant folding legacy PV path (classic PV with PV interfaces) under
HVMlite.

I got the impression that if we wanted to remove the old PV path we had to see
if we can address old classic PV x86 guests through HVMlite, otherwise we'd
have to live with the old PV path for the long term.

> > * Leave legacy stuff on the old PV path; this may be something to
> > bring to the table if we had in place a proactive solution to
> > avoid further fallout from the architecture of the huge differences
> > on the entries. The work I'm doing should help with that. (We should
> > also evaluate the implications and requirements here for that as
> > well).
>
> Classic PV guests don't have legacy hardware at all, they just have PV
> interfaces, so I'm even less sure of what this means.

Using the terms you use by "Leave legacy stuff on the old PV path" I meant
not having to address classic PV guest support through HVMLite.

Luis

2016-04-13 18:54:54

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Wed, Apr 13, 2016 at 11:05:00AM +0100, George Dunlap wrote:
> On Tue, Apr 12, 2016 at 11:12 PM, Luis R. Rodriguez <[email protected]> wrote:
> > Also, x86 does have a history of short DT use. Just pointing that its there as
> > an option as well. I'll Cc you on some thread about that.
>
> I'm not sure how this is relevant to anything.

You brought DT as a reason why ARM was able to use the native point.
I'm clarifying DT has nothing to do as a restriction on x86.

> What we're talking about is how to get from Xen to a point in the
> Linux kernel where everything can Just Work. The proposed feature is
> a mini trampoline that (as I understand it):
> 1. Tells Xen where to jump to (via ELF note)
> 2. Sets up some basic modes and pagetables and then jumps to the zero
> page so Linux can just carry on.

Right, and the my goal is to see to it we do enough homework to
ensure we reviewed all possibilities to share as much code as possible
already and looked at all options before saying we certainly need yet
another entry point. I am not convinced yet this has been done.

Luis

2016-04-13 19:10:16

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Wed, Apr 13, 2016 at 12:25:03PM +0200, Roger Pau Monn? wrote:
> On Wed, Apr 13, 2016 at 12:12:25AM +0200, Luis R. Rodriguez wrote:
> [...]
> > Also, x86 does have a history of short DT use. Just pointing that its there as
> > an option as well. I'll Cc you on some thread about that.
>
> I don't see how this is relevant to the conversation that's going on:

Its relevant as George brought up DT as a *reason* why ARM was able
to cope with no custom entry point...

> How many x86 hardware provide DT?


One. CE4100.

arch/x86/platform/ce4100/falconfalls.dt

> I bet this is 0%.

That's slightly more than 0%.

> How many OSes can boot on x86 using DT? Linux maybe, certainly FreeBSD,
> Windows or OpenBSD won't be able to boot at all when provided a DT on x86.

You guys seem to be taking these things too personal.

Let me repeat, my goal is to ensure we review things without a bias. The points
you make here *now* are things I welcome to the discussion as reasons for
ruling out DT as ways to fine tune further semantics, its however by no means
something we should have discarded.

> Is Xen going to craft a DT for x86 based on ACPI? No, because it can't parse
> the DSDT or other dynamic tables that contain the information about the
> devices in the system.

Again, DT was brought up by George as reason why ARM was able to cope
with no custom entry point. That's all. What you raise is a good point
to highlight but it does not mean we can't use it if we wanted to for
other things, for instance as an alternative to extending the x86 boot
protocol with custom things which we may need to enhance semantics
early in boot. If that is a stupid prospect lets highlight that and
rule it out.

> I would also like to point out that DT or not DT is not really the problem
> here, the issue that George was trying to point out is that on x86 there's
> some legacy hardware that's considered to be always there, so it's presence
> is not signaled by ACPI, and HVMlite is _not_ emulating this hardware. It
> doesn't matter if the hardware description comes from ACPI or DT, this
> hardware is considered to be always present on PC compatible hardware.

x86 Xen PV guests are not alone. I'm adding quirks we can use to address this
in a clean way now which turns out to be very useful for other custom x86
platforms.

Luis

2016-04-13 19:53:05

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Wed, Apr 13, 2016 at 04:44:54PM +0100, George Dunlap wrote:
> On Thu, Apr 7, 2016 at 7:51 PM, Luis R. Rodriguez <[email protected]> wrote:
> > So more to it, if the EFI entry already provides a way into Linux
> > in a more streamlined fashion bringing it closer to the bare metal
> > boot entry, why *would* we add another boot entry to x86, even if
> > its small and self contained ?
>
> We would avoid using EFI if:

And this is what I was looking for, thanks!

> * Being called both on real hardware and under Xen would make the EFI
> entry point more complicated

That's on the EFI Linux maintainer to assess. And he seems willing to
consider this.

> * Adding the necessary EFI support into Xen would be a significant
> chunk of extra work

This seems to be a good sticking point, but Andi noted another aspect
of this or redundancy as well.

> * Requiring PVH mode to implement EFI would make it more difficult for
> other kernes (NetBSD, FreeBSD) to act as dom0s.

What if this is an option only then ?

>
> * Requiring PVH mode to use EFI would make it more difficult to
> support unikernel-style workloads for domUs.

What if this is an option only then ?

> Now as has been pointed out, we don't know for a lot of the above
> things for certain, because nobody has posted any code. None of us
> really want to post any code because:
>
> * Reading and understanding the EFI spec, the Linux EFI path, and
> implementing all that on both the Xen and the Linux side is a lot of
> work
>
> * It looks pretty likely that many of the above things will be true
>
> * The only real objection to the currently proposed solution is really weak.

Not true:

* Avoiding code duplication
* Semantics may be needed anyway


> If you want to post some code I'm sure we could give you feedback on it.

Part of my engagement on HVMLite review is *because* I have been posting
code to help proactively address some old classic PV path issues and
semantics.

I've been addressing semantics on the PV path, and trying to help
bring the classic PV path closer to native entry points while trying
to also provide a proactive measure to help address regressions on the
classic PV path without having Xen be a bottleneck for x86 development.

As for the EFI stuff -- its discussion now as it'd be pointless to
throw out code if we already know we can't go down a path.

> > Another position against small stubs which I listed myself is that we may need
> > more semantics for early boot even if the new HVMLite small stub is added. This
> > remains to be seen. If we are going to add new semantics, it would seem best to
> > use something more standard like EFI configuration tables rather than hack on
> > to x86 further custom semantics. Custom sloppy semantics have proven to be
> > misused, and were ultimately a sloppy mess.
> [snip]
> >> That sounds like it's going to make the EFI path just as unmanageable as the
> >> current PV path.
> >
> > Can you describe how?
> >
> >> Using the EFI entry point would certainly make sense if it was
> >> actually simpler than the proposed extra entry point. But it sounds
> >> like it's going to be more complicated, not only for Xen, but also for
> >> Linux.
> >
> > How so? Please provide specifics.
>
> Here is the juxtaposition that confuses me. The problem with a lot of
> the current code is that you have virtualization-specific hacks all
> over the place making things complicated.

That's because of sloppy solutions.

> And in the first quote
> above, you seem afraid that the extra entry point with stub code will
> somehow be misused and end up in a similar "sloppy mess", even though
> it's not at all clear how *having a stub entry point* could be
> "abused" by anyone.

You seem to be missing the points I've raised to Boris about semantics
and requirements for custom platform stuff.

> But then when I suggest that sharing a codepath
> between systems that have actual EFI firmware, with platform hardware,
> and a system that has no EFI firmware and no similar concept of the
> hardware, might end up a sloppy mess of Xen-specific if clauses and
> maintenance headaches due to broken assumptions, it doesn't even
> register with you as a reasonable concern?

Quite the contrary! It does, the question is how we are going to address
the semantics clearly. EFI seemed to provide an OS agnostic way to
address some of this through configuration tables, which would mean
not having to extend the old x86 boot protocol further. More to the
point, this is beyond x86, if we are going to be striving to unify
entry points on Linux across architectures in the long term why not
start addressing needed semantics for virtualization through more
standard mean now?

> As Matt said, nobody will be able to provide specifics until someone
> tries to code it up. But coding things up is not free.

And he is, but privately shared so far. We still can benefit from
more architectural discussion over these things.

Luis

2016-04-14 09:42:31

by George Dunlap

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On 13/04/16 19:54, Luis R. Rodriguez wrote:
> On Wed, Apr 13, 2016 at 11:05:00AM +0100, George Dunlap wrote:
>> On Tue, Apr 12, 2016 at 11:12 PM, Luis R. Rodriguez <[email protected]> wrote:
>>> Also, x86 does have a history of short DT use. Just pointing that its there as
>>> an option as well. I'll Cc you on some thread about that.
>>
>> I'm not sure how this is relevant to anything.
>
> You brought DT as a reason why ARM was able to use the native point.
> I'm clarifying DT has nothing to do as a restriction on x86.

No, DT isn't the reason Xen is able to use the native entry point on
ARM. The reason is, to quote myself: "there are no assumptions made
about what hardware is or is not present on the system -- everything
that needs to be communicated about what is or is not present can be
passed in DT."

So that's three things:
1. DT is available to be used
2. DT is expected as the main thing that entry point accepts
3. There are no assumptions about what hardware is or is not present in
the system
4. Everything that needs to be communicated about what is or is not
present can be passed in DT.

Are #2, #3, and #4 true on x86? If not then #1 is irrelevant.

[snip from another thread]

> One. CE4100.
>
> arch/x86/platform/ce4100/falconfalls.dt

You CC'd me on some patches related to that. I don't know anything
about the code, but it looked like CE4100 is a subarch, and in response
to that thread Ingo specifically asked you to add a comment saying
basically "Don't add any more subarches".

And not only that, but the ugly, nasty legacy PV boot path we're trying
to get rid of IS ALSO A SUBARCH. So instead of a quick stub with an
extra EFI flag, you're proposing we consider add yet another Xen PV subarch?

>> What we're talking about is how to get from Xen to a point in the
>> Linux kernel where everything can Just Work. The proposed feature is
>> a mini trampoline that (as I understand it):
>> 1. Tells Xen where to jump to (via ELF note)
>> 2. Sets up some basic modes and pagetables and then jumps to the zero
>> page so Linux can just carry on.
>
> Right, and the my goal is to see to it we do enough homework to
> ensure we reviewed all possibilities to share as much code as possible
> already and looked at all options before saying we certainly need yet
> another entry point. I am not convinced yet this has been done.

I think we have different ideas about what an appropriate amount of
homework is. :-) Everything you've put forward has been given
consideration and judged unlikely to be promising; and your suggestions
for further possibilities (like this one) keep getting more and more
obviously unsuitable. We shouldn't be required to actually post code
for every single other option just to prove how ugly they are,
particularly when there's nothing particularly wrong with the code we have.

-George

2016-04-14 09:53:55

by George Dunlap

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On 13/04/16 20:52, Luis R. Rodriguez wrote:
> On Wed, Apr 13, 2016 at 04:44:54PM +0100, George Dunlap wrote:
>> On Thu, Apr 7, 2016 at 7:51 PM, Luis R. Rodriguez <[email protected]> wrote:
>>> So more to it, if the EFI entry already provides a way into Linux
>>> in a more streamlined fashion bringing it closer to the bare metal
>>> boot entry, why *would* we add another boot entry to x86, even if
>>> its small and self contained ?
>>
>> We would avoid using EFI if:
>
> And this is what I was looking for, thanks!
>
>> * Being called both on real hardware and under Xen would make the EFI
>> entry point more complicated
>
> That's on the EFI Linux maintainer to assess. And he seems willing to
> consider this.
>
>> * Adding the necessary EFI support into Xen would be a significant
>> chunk of extra work
>
> This seems to be a good sticking point, but Andi noted another aspect
> of this or redundancy as well.
>
>> * Requiring PVH mode to implement EFI would make it more difficult for
>> other kernes (NetBSD, FreeBSD) to act as dom0s.
>
> What if this is an option only then ?
>
>>
>> * Requiring PVH mode to use EFI would make it more difficult to
>> support unikernel-style workloads for domUs.
>
> What if this is an option only then ?

So first of all, you asked why anyone would oppose EFI, and this is part
of the answer to that.

Secondly, you mean "What if this is the only thing the Linux maintainers
will accept?" And you already know the answer to that.

How much of a burden it would be on the rest of the open-source
ecosystem (Xen, *BSDs, &c) is a combination of some as-yet unknown facts
(i.e., what a minimal Xen/Linux EFI interface would look like) and a
matter of judgement (i.e., given the same interface, reasonable people
may come to different conclusions about whether the interface is an
undue burden to impose on others or not).

But I would hope that the Linux maintainers would at least consider the
broader community when weighing their decisions, and not take advantage
of their position of dominance to simply ignore the effect of their
choices on everybody else.

-George

2016-04-14 19:44:13

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Thu, Apr 14, 2016 at 10:53:47AM +0100, George Dunlap wrote:
> On 13/04/16 20:52, Luis R. Rodriguez wrote:
> > On Wed, Apr 13, 2016 at 04:44:54PM +0100, George Dunlap wrote:
> >> On Thu, Apr 7, 2016 at 7:51 PM, Luis R. Rodriguez <[email protected]> wrote:
> >>> So more to it, if the EFI entry already provides a way into Linux
> >>> in a more streamlined fashion bringing it closer to the bare metal
> >>> boot entry, why *would* we add another boot entry to x86, even if
> >>> its small and self contained ?
> >>
> >> We would avoid using EFI if:
> >
> > And this is what I was looking for, thanks!
> >
> >> * Being called both on real hardware and under Xen would make the EFI
> >> entry point more complicated
> >
> > That's on the EFI Linux maintainer to assess. And he seems willing to
> > consider this.
> >
> >> * Adding the necessary EFI support into Xen would be a significant
> >> chunk of extra work
> >
> > This seems to be a good sticking point, but Andi noted another aspect
> > of this or redundancy as well.
> >
> >> * Requiring PVH mode to implement EFI would make it more difficult for
> >> other kernes (NetBSD, FreeBSD) to act as dom0s.
> >
> > What if this is an option only then ?
> >
> >>
> >> * Requiring PVH mode to use EFI would make it more difficult to
> >> support unikernel-style workloads for domUs.
> >
> > What if this is an option only then ?
>
> So first of all, you asked why anyone would oppose EFI, and this is part
> of the answer to that.
>
> Secondly, you mean "What if this is the only thing the Linux maintainers
> will accept?" And you already know the answer to that.

No, I meant to ask, would it be possible to make booting HVMLite using EFI
be optional ? That way if you already support EFI that can be used on
your entires with some small modifications.

> How much of a burden it would be on the rest of the open-source
> ecosystem (Xen, *BSDs, &c) is a combination of some as-yet unknown facts
> (i.e., what a minimal Xen/Linux EFI interface would look like) and a
> matter of judgement (i.e., given the same interface, reasonable people
> may come to different conclusions about whether the interface is an
> undue burden to impose on others or not).
>
> But I would hope that the Linux maintainers would at least consider the
> broader community when weighing their decisions, and not take advantage
> of their position of dominance to simply ignore the effect of their
> choices on everybody else.

This has nothing to do with dominance or anything nefarious, I'm asking
simply for a full engineering evaluation of all possibilities, with
the long term in mind. Not for now, but for hardware assumptions which
are sensible 5 years from now.

Luis

2016-04-14 19:59:08

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Thu, Apr 14, 2016 at 10:42:15AM +0100, George Dunlap wrote:
> On 13/04/16 19:54, Luis R. Rodriguez wrote:
> > On Wed, Apr 13, 2016 at 11:05:00AM +0100, George Dunlap wrote:
> >> On Tue, Apr 12, 2016 at 11:12 PM, Luis R. Rodriguez <[email protected]> wrote:
> >>> Also, x86 does have a history of short DT use. Just pointing that its there as
> >>> an option as well. I'll Cc you on some thread about that.
> >>
> >> I'm not sure how this is relevant to anything.
> >
> > You brought DT as a reason why ARM was able to use the native point.
> > I'm clarifying DT has nothing to do as a restriction on x86.
>
> No, DT isn't the reason Xen is able to use the native entry point on
> ARM. The reason is, to quote myself: "there are no assumptions made
> about what hardware is or is not present on the system -- everything
> that needs to be communicated about what is or is not present can be
> passed in DT."
>
> So that's three things:
> 1. DT is available to be used
> 2. DT is expected as the main thing that entry point accepts
> 3. There are no assumptions about what hardware is or is not present in
> the system
> 4. Everything that needs to be communicated about what is or is not
> present can be passed in DT.
>
> Are #2, #3, and #4 true on x86? If not then #1 is irrelevant.

2) Obviously not, but it can be used.
3) We're getting close to that, see the platform legacy work [0],
that should help us mesh things into a generic form that we
didn't have before. There may be others, as is being discussed.
If you have other ideas now would be great to hear of them.
4) we have ACPI to fill in the gaps these days for not only x86
but also ARM, as such I think it makes sense to only use DT
when it makes sense and to standardize on ACPI when possible

[0] http://lkml.kernel.org/r/[email protected]

> [snip from another thread]
>
> > One. CE4100.
> >
> > arch/x86/platform/ce4100/falconfalls.dt
>
> You CC'd me on some patches related to that. I don't know anything
> about the code, but it looked like CE4100 is a subarch, and in response
> to that thread Ingo specifically asked you to add a comment saying
> basically "Don't add any more subarches".

Yeap!

> And not only that, but the ugly, nasty legacy PV boot path we're trying
> to get rid of IS ALSO A SUBARCH. So instead of a quick stub with an
> extra EFI flag, you're proposing we consider add yet another Xen PV subarch?

A little while ago I brought that up as a possibility, given that the
semantics of use of the subarch were also loose... hence the discussion
over that, and now a patch that helps clarify the use as you were
Cc'd on.

What's been decided is that we should not extend the subarch, however
if we need a hypervisor type that's a separate topic and we would need
to address that separately. Its possible. I find it sensible specially if
the goal is to avoid more sporadic entries on Linux and to help with
early boot semantics / addressing dead code prospects.

EFI is another option which already has code and an entry and its
why I've asked us to consider it. So we should probably not really
try to look at adding a hypervisor type until we've really decided
that EFI is a no go at all and makes no sense.

IMHO we should add new entries to x86 linux only as a last resort measure.

> >> What we're talking about is how to get from Xen to a point in the
> >> Linux kernel where everything can Just Work. The proposed feature is
> >> a mini trampoline that (as I understand it):
> >> 1. Tells Xen where to jump to (via ELF note)
> >> 2. Sets up some basic modes and pagetables and then jumps to the zero
> >> page so Linux can just carry on.
> >
> > Right, and the my goal is to see to it we do enough homework to
> > ensure we reviewed all possibilities to share as much code as possible
> > already and looked at all options before saying we certainly need yet
> > another entry point. I am not convinced yet this has been done.
>
> I think we have different ideas about what an appropriate amount of
> homework is. :-) Everything you've put forward has been given
> consideration and judged unlikely to be promising;

That's fine I'm not afraid of suggestions to be discarded, my goal
is to evaluate all possibilities from an engineering point of
view, and then make decisions.

> and your suggestions for further possibilities (like this one) keep getting
> more and more obviously unsuitable.

Really ? If it wasn't for me looking into the paravirt crap you'd
end up likely with some other semantic mess. If you'd really like
me to stop chiming in let me know and I'll look away form Xen for
good like others have.

> We shouldn't be required to actually post code
> for every single other option just to prove how ugly they are,
> particularly when there's nothing particularly wrong with the code we have.

I'm not asking that. I'm asking for an engineering evaluation. That's very
different. I am going to the Xen Hackathon after all as well, not sure what
else to tell you to show you I'm only after the best engineering solution and
it seems we could do much better here.

Luis

2016-04-14 20:39:54

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

> This has nothing to do with dominance or anything nefarious, I'm asking
> simply for a full engineering evaluation of all possibilities, with
> the long term in mind. Not for now, but for hardware assumptions which
> are sensible 5 years from now.

There are two different things in my mind about this conversation:

1). semantics of low-level code wrapped around pvops. On baremetal
it is easy - just look at Intel and AMD SDM.
And this is exactly what running in HVM or HVMLite mode will do -
all those low-level operations will have the same exact semantic
as baremetal.

There is no hope for the pv_ops to fix that.

And I am pretty sure the HVMLite in 5 years will have no
trouble in this as it will be running in VMX mode (HVM).

2). Boot entry.

The semantics on Linux are well known - they are documented in
Documentation/x86/boot.txt.

HVMLite Linux guests have to somehow provide that.

And how it is done seems to be tied around:

a) Use existing boot paths - which means making some
extra stub code to call in those existing boot paths
(for example Xen could bundle with an GRUB2-alike
code to be run when booting Linux using that boot-path).

Or EFI (for a ton more code). Granted not all OSes
support those, so not very OS agnostic.

Hard part - if the bootparams change then have to
rev up the code in there. May be out of sync
with Linux bootparams.

b) Add another simpler boot entry point which has to copy
"some" strings from its format in bootparams.


So this part of the discussion does not fall in the
hardware assumptions. Intel SDM or AMD mention nothing about
boot loaders or how to boot an OS - that is all in realms
of how software talks to software.

3). And there is the discussion on man-power to make this
happen.

4). Lastly which one is simpler and involves less code so
that there is a less chance of bitrot.

2016-04-14 21:12:12

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Thu, Apr 14, 2016 at 04:38:47PM -0400, Konrad Rzeszutek Wilk wrote:
> > This has nothing to do with dominance or anything nefarious, I'm asking
> > simply for a full engineering evaluation of all possibilities, with
> > the long term in mind. Not for now, but for hardware assumptions which
> > are sensible 5 years from now.
>
> There are two different things in my mind about this conversation:
>
> 1). semantics of low-level code wrapped around pvops. On baremetal
> it is easy - just look at Intel and AMD SDM.
> And this is exactly what running in HVM or HVMLite mode will do -
> all those low-level operations will have the same exact semantic
> as baremetal.

Today Linux is KVM stupid for early boot code. I've pointed this out
before, but again, there has been no reason found to need this. Perhaps
for HVMLite we won't need this...

> There is no hope for the pv_ops to fix that.

Actually I beg to differ. See my patches and ongoing work.

> And I am pretty sure the HVMLite in 5 years will have no
> trouble in this as it will be running in VMX mode (HVM).

HVMLite may still use PV drivers for some things, its not super
obvious to me that low level semantics will not be needed yet.

> 2). Boot entry.
>
> The semantics on Linux are well known - they are documented in
> Documentation/x86/boot.txt.
>
> HVMLite Linux guests have to somehow provide that.
>
> And how it is done seems to be tied around:
>
> a) Use existing boot paths - which means making some
> extra stub code to call in those existing boot paths
> (for example Xen could bundle with an GRUB2-alike
> code to be run when booting Linux using that boot-path).
>
> Or EFI (for a ton more code). Granted not all OSes
> support those, so not very OS agnostic.

What other OSes do is something to consider but if they don't
do it because they are slacking in one domain should by no means
be a reason to not evaluate the long term possible gains.
Specially if we have reasons to believe more architectures will
consider it and standardize on it.

It'd be silly not to take this a bit more seriously.

> Hard part - if the bootparams change then have to
> rev up the code in there. May be out of sync
> with Linux bootparams.

If we are going to ultimately standardize on EFI boot for new
hardware it'd be rather silly to extend the boot params further.

> b) Add another simpler boot entry point which has to copy
> "some" strings from its format in bootparams.
>
>
> So this part of the discussion does not fall in the
> hardware assumptions. Intel SDM or AMD mention nothing about
> boot loaders or how to boot an OS - that is all in realms
> of how software talks to software.

Right -- so one question to ask here is what other uses are there
for this outside of say HVMLite. You mentioned Multiboot so far.

> 3). And there is the discussion on man-power to make this
> happen.

Sure.

> 4). Lastly which one is simpler and involves less code so
> that there is a less chance of bitrot.

Indeed.

You also forgot the tie-in between dead-code and semantics but
that clearly is not on your mind. But I'd say this is a good
summary.

Luis

2016-04-15 02:16:26

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Thu, Apr 14, 2016 at 11:12:01PM +0200, Luis R. Rodriguez wrote:
> On Thu, Apr 14, 2016 at 04:38:47PM -0400, Konrad Rzeszutek Wilk wrote:
> > > This has nothing to do with dominance or anything nefarious, I'm asking
> > > simply for a full engineering evaluation of all possibilities, with
> > > the long term in mind. Not for now, but for hardware assumptions which
> > > are sensible 5 years from now.
> >
> > There are two different things in my mind about this conversation:
> >
> > 1). semantics of low-level code wrapped around pvops. On baremetal
> > it is easy - just look at Intel and AMD SDM.
> > And this is exactly what running in HVM or HVMLite mode will do -
> > all those low-level operations will have the same exact semantic
> > as baremetal.
>
> Today Linux is KVM stupid for early boot code. I've pointed this out

-EPARSE?
> before, but again, there has been no reason found to need this. Perhaps
> for HVMLite we won't need this...

Are you talking about kvmtools? Which BTW are similar to how HVMLite
would expose the platform.
>
> > There is no hope for the pv_ops to fix that.
>
> Actually I beg to differ. See my patches and ongoing work.

I meant in terms of semantics. As in I cannot see some of
those pv-ops to have the same semantics as baremetal. For example
set_pte is simple on x86 (movq $<some value>, <memory address>).

While on Xen PV it is a potential batching hypercall with
lookup in an P2M table, then perhaps a sidelong look at
the M2P, then maybe the M2P override.

>
> > And I am pretty sure the HVMLite in 5 years will have no
> > trouble in this as it will be running in VMX mode (HVM).
>
> HVMLite may still use PV drivers for some things, its not super
> obvious to me that low level semantics will not be needed yet.

PV drivers are very different from low-level semantics.

And it will have to use them.

Maybe it is easier to think of this in terms of kvmtool - it
is pretty much how this would work - but instead of VirtIO
drivers you would be using the Xen PV drivers (thought one
could also use VirtIO ones if you wanted).
>
> > 2). Boot entry.
> >
> > The semantics on Linux are well known - they are documented in
> > Documentation/x86/boot.txt.
> >
> > HVMLite Linux guests have to somehow provide that.
> >
> > And how it is done seems to be tied around:
> >
> > a) Use existing boot paths - which means making some
> > extra stub code to call in those existing boot paths
> > (for example Xen could bundle with an GRUB2-alike
> > code to be run when booting Linux using that boot-path).
> >
> > Or EFI (for a ton more code). Granted not all OSes
> > support those, so not very OS agnostic.
>
> What other OSes do is something to consider but if they don't
> do it because they are slacking in one domain should by no means
> be a reason to not evaluate the long term possible gains.
> Specially if we have reasons to believe more architectures will
> consider it and standardize on it.
>
> It'd be silly not to take this a bit more seriously.

Complexity vs simplicity.
>
> > Hard part - if the bootparams change then have to
> > rev up the code in there. May be out of sync
> > with Linux bootparams.
>
> If we are going to ultimately standardize on EFI boot for new
> hardware it'd be rather silly to extend the boot params further.

Whoa there... Have you spoken to hpa,tglrx about this?

>
> > b) Add another simpler boot entry point which has to copy
> > "some" strings from its format in bootparams.
> >
> >
> > So this part of the discussion does not fall in the
> > hardware assumptions. Intel SDM or AMD mention nothing about
> > boot loaders or how to boot an OS - that is all in realms
> > of how software talks to software.
>
> Right -- so one question to ask here is what other uses are there
> for this outside of say HVMLite. You mentioned Multiboot so far.
>
> > 3). And there is the discussion on man-power to make this
> > happen.
>
> Sure.
>
> > 4). Lastly which one is simpler and involves less code so
> > that there is a less chance of bitrot.
>
> Indeed.
>
> You also forgot the tie-in between dead-code and semantics but

Wait, I just spoke about CPU semantics?! Which semantics
are you talking about?
> that clearly is not on your mind. But I'd say this is a good
> summary.

I put 'dead code' in the same realm as device drivers work.
And they seem to always have some issue or another.
Or maybe I getting unlucky and getting copied on those bugs.
>
> Luis

2016-04-15 05:50:30

by Jürgen Groß

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On 14/04/16 21:44, Luis R. Rodriguez wrote:
> On Thu, Apr 14, 2016 at 10:53:47AM +0100, George Dunlap wrote:
>> On 13/04/16 20:52, Luis R. Rodriguez wrote:
>>> On Wed, Apr 13, 2016 at 04:44:54PM +0100, George Dunlap wrote:
>>>> On Thu, Apr 7, 2016 at 7:51 PM, Luis R. Rodriguez <[email protected]> wrote:
>>>>> So more to it, if the EFI entry already provides a way into Linux
>>>>> in a more streamlined fashion bringing it closer to the bare metal
>>>>> boot entry, why *would* we add another boot entry to x86, even if
>>>>> its small and self contained ?
>>>>
>>>> We would avoid using EFI if:
>>>
>>> And this is what I was looking for, thanks!
>>>
>>>> * Being called both on real hardware and under Xen would make the EFI
>>>> entry point more complicated
>>>
>>> That's on the EFI Linux maintainer to assess. And he seems willing to
>>> consider this.
>>>
>>>> * Adding the necessary EFI support into Xen would be a significant
>>>> chunk of extra work
>>>
>>> This seems to be a good sticking point, but Andi noted another aspect
>>> of this or redundancy as well.
>>>
>>>> * Requiring PVH mode to implement EFI would make it more difficult for
>>>> other kernes (NetBSD, FreeBSD) to act as dom0s.
>>>
>>> What if this is an option only then ?
>>>
>>>>
>>>> * Requiring PVH mode to use EFI would make it more difficult to
>>>> support unikernel-style workloads for domUs.
>>>
>>> What if this is an option only then ?
>>
>> So first of all, you asked why anyone would oppose EFI, and this is part
>> of the answer to that.
>>
>> Secondly, you mean "What if this is the only thing the Linux maintainers
>> will accept?" And you already know the answer to that.
>
> No, I meant to ask, would it be possible to make booting HVMLite using EFI
> be optional ? That way if you already support EFI that can be used on
> your entires with some small modifications.

So you suggest to add two HVMlite modes regarding boot interface
instead of one?

I still have the impression you are suggesting by using the same entry
everything is solved in the OS. You still need the support of HVMlite
especially in the early boot path to make sure the OS won't try to use
the complete EFI standard.

>
>> How much of a burden it would be on the rest of the open-source
>> ecosystem (Xen, *BSDs, &c) is a combination of some as-yet unknown facts
>> (i.e., what a minimal Xen/Linux EFI interface would look like) and a
>> matter of judgement (i.e., given the same interface, reasonable people
>> may come to different conclusions about whether the interface is an
>> undue burden to impose on others or not).
>>
>> But I would hope that the Linux maintainers would at least consider the
>> broader community when weighing their decisions, and not take advantage
>> of their position of dominance to simply ignore the effect of their
>> choices on everybody else.
>
> This has nothing to do with dominance or anything nefarious, I'm asking
> simply for a full engineering evaluation of all possibilities, with
> the long term in mind. Not for now, but for hardware assumptions which
> are sensible 5 years from now.

No, they are not.

Given how long the EFI standard is available now and how buggy many
vendor's implementations are I don't expect all computers sold in 5
years will have a usable EFI. This will be true especially for
consumer devices where no EFI is available today.


Juergen

2016-04-15 09:59:34

by George Dunlap

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On 14/04/16 20:44, Luis R. Rodriguez wrote:
> On Thu, Apr 14, 2016 at 10:53:47AM +0100, George Dunlap wrote:
>> On 13/04/16 20:52, Luis R. Rodriguez wrote:
>>> On Wed, Apr 13, 2016 at 04:44:54PM +0100, George Dunlap wrote:
>>>> On Thu, Apr 7, 2016 at 7:51 PM, Luis R. Rodriguez <[email protected]> wrote:
>>>>> So more to it, if the EFI entry already provides a way into Linux
>>>>> in a more streamlined fashion bringing it closer to the bare metal
>>>>> boot entry, why *would* we add another boot entry to x86, even if
>>>>> its small and self contained ?
>>>>
>>>> We would avoid using EFI if:
>>>
>>> And this is what I was looking for, thanks!
>>>
>>>> * Being called both on real hardware and under Xen would make the EFI
>>>> entry point more complicated
>>>
>>> That's on the EFI Linux maintainer to assess. And he seems willing to
>>> consider this.
>>>
>>>> * Adding the necessary EFI support into Xen would be a significant
>>>> chunk of extra work
>>>
>>> This seems to be a good sticking point, but Andi noted another aspect
>>> of this or redundancy as well.
>>>
>>>> * Requiring PVH mode to implement EFI would make it more difficult for
>>>> other kernes (NetBSD, FreeBSD) to act as dom0s.
>>>
>>> What if this is an option only then ?
>>>
>>>>
>>>> * Requiring PVH mode to use EFI would make it more difficult to
>>>> support unikernel-style workloads for domUs.
>>>
>>> What if this is an option only then ?
>>
>> So first of all, you asked why anyone would oppose EFI, and this is part
>> of the answer to that.
>>
>> Secondly, you mean "What if this is the only thing the Linux maintainers
>> will accept?" And you already know the answer to that.
>
> No, I meant to ask, would it be possible to make booting HVMLite using EFI
> be optional ? That way if you already support EFI that can be used on
> your entires with some small modifications.

Oh -- I read both those lines as, "What if this is *the only option*
then?" (which I then interpreted to mean, what if booting EFI is the
only thing Linux will accept). The rest of my reply is based on that
misunderstanding. Sorry about that.

Regarding the second one -- I wasn't talking about actual non-Linux
unikernels; I was talking about using Linux in the way that unikernels
are used ("unikernel-style"). That is, you boot a minimal Linux image
with a small ramdisk and have a single process running as init. For
this use case, even an extra megabyte of guest RAM and an extra second
of boot time is a significant cost. "Use OVMF for domUs" is an
excellent solution for traditional VMs where you boot a full distro, but
would impose a significant cost on using Linux in unikernel-style VMs.

Whether a stripped-down EFI support would be sufficiently low memory /
latency for such workloads is an open question that would take time and
engineering effort to discover. And in any case, it would certainly
require the maintenance of Yet Another Bootloader in the Xen source tree.

-George

2016-04-15 15:24:21

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Fri, Apr 15, 2016 at 07:50:25AM +0200, Juergen Gross wrote:
> On 14/04/16 21:44, Luis R. Rodriguez wrote:
> > No, I meant to ask, would it be possible to make booting HVMLite using EFI
> > be optional ? That way if you already support EFI that can be used on
> > your entires with some small modifications.
>
> So you suggest to add two HVMlite modes regarding boot interface
> instead of one?

Not suggest, I'm evaluating what options we have available. That's very
different from suggesting. That's the point to this whole topic, pure and
simple evaluation of options.

> Given how long the EFI standard is available now and how buggy many
> vendor's implementations are I don't expect all computers sold in 5
> years will have a usable EFI. This will be true especially for
> consumer devices where no EFI is available today.

Thanks this really helps.

Luis

2016-04-15 15:30:32

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Fri, Apr 15, 2016 at 10:59:16AM +0100, George Dunlap wrote:
> On 14/04/16 20:44, Luis R. Rodriguez wrote:
> > No, I meant to ask, would it be possible to make booting HVMLite using EFI
> > be optional ? That way if you already support EFI that can be used on
> > your entires with some small modifications.
>
> I wasn't talking about actual non-Linux unikernels; I was talking about using
> Linux in the way that unikernels are used ("unikernel-style"). That is, you
> boot a minimal Linux image with a small ramdisk and have a single process
> running as init. For this use case, even an extra megabyte of guest RAM and
> an extra second of boot time is a significant cost. "Use OVMF for domUs" is
> an excellent solution for traditional VMs where you boot a full distro, but
> would impose a significant cost on using Linux in unikernel-style VMs.

Understood.

> Whether a stripped-down EFI support would be sufficiently low memory /
> latency for such workloads is an open question that would take time and
> engineering effort to discover. And in any case, it would certainly
> require the maintenance of Yet Another Bootloader in the Xen source tree.

OVMF is used by ARM, so using it should be a matter of adaptation, and
some changes other than perhaps DT use. Question still stands though,
would it be possible to have HVMLite be using EFI as an option so that
some users could opt-in if they so wish ?

To be clear, at this point I am not suggesting this be done, just evaluating
the options available.

Luis

2016-04-15 16:07:08

by George Dunlap

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On 15/04/16 16:30, Luis R. Rodriguez wrote:
> On Fri, Apr 15, 2016 at 10:59:16AM +0100, George Dunlap wrote:
>> On 14/04/16 20:44, Luis R. Rodriguez wrote:
>>> No, I meant to ask, would it be possible to make booting HVMLite using EFI
>>> be optional ? That way if you already support EFI that can be used on
>>> your entires with some small modifications.
>>
>> I wasn't talking about actual non-Linux unikernels; I was talking about using
>> Linux in the way that unikernels are used ("unikernel-style"). That is, you
>> boot a minimal Linux image with a small ramdisk and have a single process
>> running as init. For this use case, even an extra megabyte of guest RAM and
>> an extra second of boot time is a significant cost. "Use OVMF for domUs" is
>> an excellent solution for traditional VMs where you boot a full distro, but
>> would impose a significant cost on using Linux in unikernel-style VMs.
>
> Understood.
>
>> Whether a stripped-down EFI support would be sufficiently low memory /
>> latency for such workloads is an open question that would take time and
>> engineering effort to discover. And in any case, it would certainly
>> require the maintenance of Yet Another Bootloader in the Xen source tree.
>
> OVMF is used by ARM, so using it should be a matter of adaptation, and
> some changes other than perhaps DT use. Question still stands though,
> would it be possible to have HVMLite be using EFI as an option so that
> some users could opt-in if they so wish ?

Well we definitely intend go have a mode of PVH* which boots OVMF to
EFI-enabled guests, if that's what you mean. For one thing, that should
in theory allow us to boot Windows guests without needing to spin up
qemu to emulate any devices (since OVMF will be able to access the PV
devices until the Windows PV drivers come up). Booting to EFI-enabled
distros is certainly something we want as well.

But we need an option for dom0, and ideally we'd like an option for
lightweight Linux guests. It's using EFI for those purposes that we're
pushing back on.

-George

* I'm saying PVH because I hope when everything is sorted out we can
just call HVMLite PVH again.

2016-04-15 17:17:18

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

On Fri, Apr 15, 2016 at 05:03:07PM +0100, George Dunlap wrote:
> On 15/04/16 16:30, Luis R. Rodriguez wrote:
> > On Fri, Apr 15, 2016 at 10:59:16AM +0100, George Dunlap wrote:
> >> On 14/04/16 20:44, Luis R. Rodriguez wrote:
> >>> No, I meant to ask, would it be possible to make booting HVMLite using EFI
> >>> be optional ? That way if you already support EFI that can be used on
> >>> your entires with some small modifications.
> >>
> >> I wasn't talking about actual non-Linux unikernels; I was talking about using
> >> Linux in the way that unikernels are used ("unikernel-style"). That is, you
> >> boot a minimal Linux image with a small ramdisk and have a single process
> >> running as init. For this use case, even an extra megabyte of guest RAM and
> >> an extra second of boot time is a significant cost. "Use OVMF for domUs" is
> >> an excellent solution for traditional VMs where you boot a full distro, but
> >> would impose a significant cost on using Linux in unikernel-style VMs.
> >
> > Understood.
> >
> >> Whether a stripped-down EFI support would be sufficiently low memory /
> >> latency for such workloads is an open question that would take time and
> >> engineering effort to discover. And in any case, it would certainly
> >> require the maintenance of Yet Another Bootloader in the Xen source tree.
> >
> > OVMF is used by ARM, so using it should be a matter of adaptation, and
> > some changes other than perhaps DT use. Question still stands though,
> > would it be possible to have HVMLite be using EFI as an option so that
> > some users could opt-in if they so wish ?
>
> Well we definitely intend go have a mode of PVH* which boots OVMF to
> EFI-enabled guests, if that's what you mean. For one thing, that should
> in theory allow us to boot Windows guests without needing to spin up
> qemu to emulate any devices (since OVMF will be able to access the PV
> devices until the Windows PV drivers come up).

OK so for Windows x86 HVMLite will need to go the EFI boot route for sure,
only it will use OVMF ?

> Booting to EFI-enabled
> distros is certainly something we want as well.
>
> But we need an option for dom0, and ideally we'd like an option for
> lightweight Linux guests. It's using EFI for those purposes that we're
> pushing back on.
>
> -George
>
> * I'm saying PVH because I hope when everything is sorted out we can
> just call HVMLite PVH again.

OK sure, so so long as:

* Other OSes don't have to use EFI
* We keep a Linux non-EFI lightweight boot mechanism

Then the OVMF / EFI route (perhaps alternatives might be minimal EFI
emulation) is still a prospect on the table long term.

Luis

2016-04-15 22:53:34

by Matt Fleming

[permalink] [raw]
Subject: Re: HVMLite / PVHv2 - using x86 EFI boot entry

(Sorry, just realised I never replied to this)

On Wed, 13 Apr, at 01:59:10PM, Roger Pau Monn? wrote:
>
> Is this header compatible with the ELF header? Con both co-exist in the
> same binary without issues?

Nope, they cannot. We get away with mixing bzImage headers and PE/COFF
headers for the EFI stub because bzImage has no magic string and
contains historical code at the start of the file. The code is never
executed in practice nowadays (it tells the user to use a boot loader
instead of direct execution) so we just stamp a PE/COFF header over it
when CONFIG_EFI_STUB is enabled.