Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755237AbcDMS34 (ORCPT ); Wed, 13 Apr 2016 14:29:56 -0400 Received: from mx2.suse.de ([195.135.220.15]:60306 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755018AbcDMS3z (ORCPT ); Wed, 13 Apr 2016 14:29:55 -0400 Date: Wed, 13 Apr 2016 20:29:51 +0200 From: "Luis R. Rodriguez" To: Juergen Gross Cc: "Luis R. Rodriguez" , David Vrabel , Julien Grall , Stefano Stabellini , Andrew Cooper , Boris Ostrovsky , Roger Pau =?iso-8859-1?Q?Monn=E9?= , Matt Fleming , Charles Arndol , Jim Fehlig , Jan Beulich , Daniel Kiper , "H. Peter Anvin" , x86@kernel.org, Gary Lin , Andy Lutomirski , Borislav Petkov , joeyli , Jeffrey Cheung , Michael Chang , =?utf-8?Q?Vojt=C4=9Bch_Pavl=C3=ADk?= , linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, Linus Torvalds Subject: Re: HVMLite / PVHv2 - using x86 EFI boot entry Message-ID: <20160413182951.GW1990@wotan.suse.de> References: <20160406024027.GX1990@wotan.suse.de> <5704D978.1050101@citrix.com> <20160408204032.GR1990@wotan.suse.de> <570B3228.90400@suse.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <570B3228.90400@suse.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7973 Lines: 175 On Mon, Apr 11, 2016 at 07:12:08AM +0200, Juergen Gross wrote: > On 08/04/16 22:40, Luis R. Rodriguez wrote: > > On Wed, Apr 06, 2016 at 10:40:08AM +0100, David Vrabel wrote: > >> On 06/04/16 03:40, Luis R. Rodriguez wrote: > >>> > >>> * You don't need full EFI emulation > >> > >> I think needing any EFI emulation inside Xen (which is where it would > >> need to be for dom0) is not suitable because of the increase in > >> hypervisor ABI. > > > > Is this because of timing on architecture / design of HVMLite, or > > a general position that the complexity to deal with EFI emulation > > is too much for Xen's taste ? > > The Xen hypervisor should be as small as possible. Adding an EFI > emulator will be adding quite some code. This should be done after a > very thorough evaluation only. Sure. > > ARM already went the EFI entry way for domU -- it went the OVMF route, > > would such a possibility be possible for x86 domU HVMLite ? If not why > > not, I mean it would seem to make sense to at least mimic the same type > > of early boot environment, and perhaps there are some lessons to be > > learned from that effort too. > > The final solution must be appropriate for dom0, too. So don't try > to limit the discussion to domU. If dom0 isn't going to be acceptable > there will no need to discuss domU. Understood. George noted that on ARM dom0 still uses the ARM native entry point, it seems to accomplish this as it uses a device tree node. I'll chime in on that in another thread. > > Are there some lessons to be learned with ARM's effort? What are they? > > If that could be re-done again with any type of cleaner path, what > > could that be that could help the x86 side ? > > > > Although emulating EFI may require work, some folks have pointed out > > that the amount of work may not be that much. If that is done can > > we instead rely on the same code to replace OVMF to support both > > Xen ARM and Xen HVMLite on x86 ? What would be the pros / cons of > > this ? > > > >> I also still do not understand your objection to the current tiny stub. > > > > Its more of a hypothetical -- can an EFI entry be used instead given > > it already does exactly what the new small entry does ? Its also rather > > odd to add a new entry without evaluating fully a possible alternative > > that would provide the same exact mechanism. > > The interface isn't the new entry only. It should be evaluated how much > of the early EFI boot path would be common to the HVMlite one. We also have other asm code which can be shared. I'll reply to Boris' original e-mail with what I can identify as perhaps sharable. There is obviously more as you allude. > What would be gained by using the same entry but having two different boot > paths after it? Its a good question. In summary for me it would be the push for sharing more code and the push for semantics on early boot to address differences proactively, and ultimately it may enable us to help bring closer the old PV boot path closer. I'll elaborate on this but first let's clarify why a new entry is used for HVMlite to start of with: 1) Xen ABI has historically not wanted to set up the boot params for Linux guests, instead it insists on letting the Linux kernel Xen boot stubs fill that out for it. This sticking point means it has implicated a boot stub. The HVMLite boot entry tries to bring the boot entries paths closer as it leverages more of the HVM boot path philosophy to mimic the regular PC boot path. Is HVMLite supposed to support legacy PV guests as well BTW ? Reason I'm highlighting Xen ABI as a *reason* alone is that even with today's large discrepancy on the old PV boot path I believe we can bring together the boot paths closer together if the Xen ABI was slightly flexible about this, I've highlighted how I believe that is possible before, *iff* the Xen ABI would at the very least set 2 things only: a) Hypervisor type b) A custom data pointer This would enable a single boot entry on the guest to handle then: Pseudo code: startup_32() startup_64() | | | | V V pre_hypervisor_stub_32() pre_hypervisor_stub_64() | | | | V V [existing startup_32()] [existing startup_64()] | | | | V V post_hypervisor_stub_32() post_hypervisor_stub_64() If the Xen ABI was flexible about setting a hypervisor type and custom data pointer then we would haven handlers for it, and in it, it can do whatever it thinks is needed for its own guest types. It could also continue to set the zero page on its own as it sees fit. Again, note that if this is done it could also mean even bringing together the old PV boot path closer together... so this is not just a prospect for HVMLite but also for old PV guests. 2) Because of 1) it has meant we have no formal semantics for early boot code is available and so severe differences can best be addressed also by yet another boot entry. This has meant often times not addressing or not knowing if we've addressed real differences between the different entries. Case in point, dead code [0]. How do we know we will not run certain code that should not run for the different entries ? Without *any* semantics later in boot code to distinguish where we came from and because we strive to build single kernels with different possible run time environments it means we have tons of code available to execute / run that we may not need. Because of the lack of semantics we may still have dead code prospects with the new HVMLite entry. How are we sure there is no differences ? [0] http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html 3) Unikernel / other OS requirements: this is really tied to 2) but even if we tried to evolve the Xen ABI it would mean considering existing solutions out there. Things to consider as an example: FreeBSD doesn't have an EFI entry, unikernels want a simple boot entry. With this in mind then, that I can think of: Cons of using the same entry but having two different boot paths: * Pushes the Xen ABI, needs to make everyone happy, this is hard * Perhaps harder to implement Gains of striving to use the same entry but having two different boot: * Helps to share more code easily * Reduce attack surface * Requires us to have semantics for early boot; this has a series of side benefits: - Means you should try to address differences explicitly rather than implicitly -- case in point Dead Code > You still need a way to distinguish between bare metal > EFI and HVMlite. Great point! This is the semantics aspect. The new entry for HVMlite approach deals with this by making the differences implicit by the new entry point. My call for addressing this through a hypervisor type was to see if we can get those semantics added explicitly so we can also later address dead code concerns for the new HVMLite guest type. Part of my own interest in an EFI entry here is that EFI could be used to help expand on the semantics in an OS/agnostic form rather than pushing the x86 boot protocol further. That seems to have its own set of drawbacks though. > And Xen needs a way to find out whether a kernel is > supporting HVMlite to boot it in the correct mode. How was Xen going to find out if new kernels had HVMlite support with the new entry ? An ELFNOTE() ? If an entry is shared could we note use an ELFNOTE() also for this though too ? Luis