Date: Wed, 13 Apr 2016 20:29:51 +0200
From: "Luis R. Rodriguez" <mcgrof@kernel.org>
To: Juergen Gross <jgross@suse.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>,
        David Vrabel <david.vrabel@citrix.com>,
        Julien Grall <julien.grall@arm.com>,
        Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
        Andrew Cooper <andrew.cooper3@citrix.com>,
        Boris Ostrovsky <boris.ostrovsky@oracle.com>,
        Roger Pau =?iso-8859-1?Q?Monn=E9?= <roger.pau@citrix.com>,
        Matt Fleming <matt@codeblueprint.co.uk>,
        Charles Arndol <carnold@suse.com>, Jim Fehlig <jfehlig@suse.com>,
        Jan Beulich <JBeulich@suse.com>,
        Daniel Kiper <daniel.kiper@oracle.com>,
        "H. Peter Anvin" <hpa@zytor.com>, x86@kernel.org,
        Gary Lin <GLin@suse.com>, Andy Lutomirski <luto@amacapital.net>,
        Borislav Petkov <bp@alien8.de>, joeyli <jlee@suse.com>,
        Jeffrey Cheung <JCheung@suse.com>, Michael Chang <MChang@suse.com>,
        =?utf-8?Q?Vojt=C4=9Bch_Pavl=C3=ADk?= <vojtech@suse.cz>,
        linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org,
        Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: HVMLite / PVHv2 - using x86 EFI boot entry
Message-ID: <20160413182951.GW1990@wotan.suse.de>
References: <20160406024027.GX1990@wotan.suse.de>
 <5704D978.1050101@citrix.com>
 <20160408204032.GR1990@wotan.suse.de>
 <570B3228.90400@suse.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <570B3228.90400@suse.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 7973
Lines: 175

On Mon, Apr 11, 2016 at 07:12:08AM +0200, Juergen Gross wrote:
> On 08/04/16 22:40, Luis R. Rodriguez wrote:
> > On Wed, Apr 06, 2016 at 10:40:08AM +0100, David Vrabel wrote:
> >> On 06/04/16 03:40, Luis R. Rodriguez wrote:
> >>>
> >>>     * You don't need full EFI emulation
> >>
> >> I think needing any EFI emulation inside Xen (which is where it would
> >> need to be for dom0) is not suitable because of the increase in
> >> hypervisor ABI.
> > 
> > Is this because of timing on architecture / design of HVMLite, or
> > a general position that the complexity to deal with EFI emulation
> > is too much for Xen's taste ?
> 
> The Xen hypervisor should be as small as possible. Adding an EFI
> emulator will be adding quite some code. This should be done after a
> very thorough evaluation only.

Sure.

> > ARM already went the EFI entry way for domU -- it went the OVMF route,
> > would such a possibility be possible for x86 domU HVMLite ? If not why
> > not, I mean it would seem to make sense to at least mimic the same type
> > of early boot environment, and perhaps there are some lessons to be
> > learned from that effort too.
> 
> The final solution must be appropriate for dom0, too. So don't try
> to limit the discussion to domU. If dom0 isn't going to be acceptable
> there will no need to discuss domU.

Understood. George noted that on ARM dom0 still uses the ARM native entry
point, it seems to accomplish this as it uses a device tree node. I'll
chime in on that in another thread.

> > Are there some lessons to be learned with ARM's effort? What are they?
> > If that could be re-done again with any type of cleaner path, what
> > could that be that could help the x86 side ?
> > 
> > Although emulating EFI may require work, some folks have pointed out
> > that the amount of work may not be that much. If that is done can
> > we instead rely on the same code to replace OVMF to support both
> > Xen ARM and Xen HVMLite on x86 ? What would be the pros / cons of
> > this ?
> > 
> >> I also still do not understand your objection to the current tiny stub.
> > 
> > Its more of a hypothetical -- can an EFI entry be used instead given
> > it already does exactly what the new small entry does ? Its also rather
> > odd to add a new entry without evaluating fully a possible alternative
> > that would provide the same exact mechanism.
> 
> The interface isn't the new entry only. It should be evaluated how much
> of the early EFI boot path would be common to the HVMlite one.

We also have other asm code which can be shared. I'll reply to Boris'
original e-mail with what I can identify as perhaps sharable. There is
obviously more as you allude.

> What would be gained by using the same entry but having two different boot
> paths after it?

Its a good question. In summary for me it would be the push for sharing more
code and the push for semantics on early boot to address differences
proactively, and ultimately it may enable us to help bring closer the old PV
boot path closer.

I'll elaborate on this but first let's clarify why a new entry is used for
HVMlite to start of with:

  1) Xen ABI has historically not wanted to set up the boot params for Linux
     guests, instead it insists on letting the Linux kernel Xen boot stubs fill
     that out for it. This sticking point means it has implicated a boot stub.
     The HVMLite boot entry tries to bring the boot entries paths closer as it
     leverages more of the HVM boot path philosophy to mimic the regular PC boot
     path.

     Is HVMLite supposed to support legacy PV guests as well BTW ?

     Reason I'm highlighting Xen ABI as a *reason* alone is that even with
     today's large discrepancy on the old PV boot path I believe we can
     bring together the boot paths closer together if the Xen ABI was slightly
     flexible about this, I've highlighted how I believe that is possible before,
     *iff* the Xen ABI would at the very least set 2 things only:

     a) Hypervisor type
     b) A custom data pointer

     This would enable a single boot entry on the guest to handle then:

	Pseudo code:

	startup_32()                         startup_64()
	       |                                  |
	       |                                  |
	       V                                  V
	pre_hypervisor_stub_32()        pre_hypervisor_stub_64()
	       |                                  |
	       |                                  |
	       V                                  V
	 [existing startup_32()]       [existing startup_64()]
	       |                                  |
	       |                                  |
	       V                                  V
	post_hypervisor_stub_32()       post_hypervisor_stub_64()

     
     If the Xen ABI was flexible about setting a hypervisor type and custom
     data pointer then we would haven handlers for it, and in it, it can
     do whatever it thinks is needed for its own guest types. It could
     also continue to set the zero page on its own as it sees fit.

     Again, note that if this is done it could also mean even bringing together
     the old PV boot path closer together... so this is not just a prospect
     for HVMLite but also for old PV guests.

  2) Because of 1) it has meant we have no formal semantics for early boot
     code is available and so severe differences can best be addressed also
     by yet another boot entry. This has meant often times not addressing
     or not knowing if we've addressed real differences between the different
     entries. Case in point, dead code [0]. How do we know we will not run
     certain code that should not run for the different entries ? Without
     *any* semantics later in boot code to distinguish where we came from
     and because we strive to build single kernels with different possible
     run time environments it means we have tons of code available to
     execute / run that we may not need.

     Because of the lack of semantics we may still have dead code prospects
     with the new HVMLite entry. How are we sure there is no differences ?

[0] http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html

  3) Unikernel / other OS requirements: this is really tied to 2) but even if
     we tried to evolve the Xen ABI it would mean considering existing solutions
     out there. Things to consider as an example: FreeBSD doesn't have an EFI
     entry, unikernels want a simple boot entry.

With this in mind then, that I can think of:

Cons of using the same entry but having two different boot paths:

  * Pushes the Xen ABI, needs to make everyone happy, this is hard
  * Perhaps harder to implement

Gains of striving to use the same entry but having two different boot:

 * Helps to share more code easily
 * Reduce attack surface
 * Requires us to have semantics for early boot; this has a series of
   side benefits:
   - Means you should try to address differences explicitly rather than
     implicitly -- case in point Dead Code

> You still need a way to distinguish between bare metal
> EFI and HVMlite.

Great point! This is the semantics aspect. The new entry for HVMlite approach
deals with this by making the differences implicit by the new entry point.
My call for addressing this through a hypervisor type was to see if we can
get those semantics added explicitly so we can also later address dead
code concerns for the new HVMLite guest type.

Part of my own interest in an EFI entry here is that EFI could be used to help
expand on the semantics in an OS/agnostic form rather than pushing the x86 boot
protocol further. That seems to have its own set of drawbacks though.


> And Xen needs a way to find out whether a kernel is
> supporting HVMlite to boot it in the correct mode.

How was Xen going to find out if new kernels had HVMlite support with the
new entry ? An ELFNOTE() ? If an entry is shared could we note use an
ELFNOTE() also for this though too ?

  Luis