2006-08-03 10:14:23

by Zachary Amsden

[permalink] [raw]
Subject: A proposal - binary

I would like to propose an interface for linking a GPL kernel,
specifically,
Linux, against binary blobs. Stop. Before you condemn this as evil,
let me
explain in precise detail what a binary blob is.

First, there are two kinds of binary blobs. There are the evil, malignant
kind, that use manipulative and despicable techniques to subvert the GPL by
copying code into Linux and, the most evil kind, which uses a GPL
wrapper to
export GPL-only symbols to the binary blob. This is unconditionally
wrong. I
do not support this kind of use. These evil blobs are used to lock
people into
a particular type of protocol or proprietary hardware interface. In my
personal opinion, they should be unconditionally banned, or at least
phased out
rapidly from any GPL kernel. I have been frustrated by this in the
past, where
binary filesystem modules would not allow me access to AFS when
upgrading to a
new kernel version. I do not wish that on anyone.

But there is also another kind of binary blob. These are not the evil,
nasty
subversion type blobs, but a benign kind that actually exists in binary
form
not to close off choice, but to open it. This is exactly the kind of
binary
interface we are proposing with the VMI design. Only with a specific
ABI can you guarantee future compatibility. This is exactly the same thing
I believe some hardware vendors are trying to do. When you have a fairly
complex interaction with the hardware layer, you have a lot of code which
suddenly becomes hardware dependent. When that hardware is actually
changing
rapidly, you have a serious problem keeping released drivers for that
hardware
in sync. Software release cycles are becoming much longer, and
delivering new
capabilities to average consumers outside of that software release cycle
is a
very difficult problem to solve. As a result, vendors build some smarts
into
the hardware - a firmware piece that can be loaded and actually run on the
processor. This firmware allows the same driver to be used for many
different
versions of the hardware, allowing a single software release to support
multiple versions of yet to be released hardware. It is only natural to
run
this privileged software inside a privileged context - the kernel.

In our case, this "hardware" is actually the virtual machine. We must deal
with changes to the underlying hardware, as they are happening rapidly,
and we
must support future compatibility for customers that decide to start
using a
virtual machine in 2006 - it is a virtual machine, after all, and it should
continue running in 2016, no matter what the underlying hardware at that
time
will look like. In this sense, we have an even larger future compatibility
problem to solve than most hardware vendors. So it is critical to get an
interface that works now.

The essence of our interface is a separation between the kernel, and the
hypervisor compatibility layer, which we call the VMI. This layer is
completely separate from the kernel, and absolutely cannot be compiled
into the
kernel. Why? Because doing so negates all of the benefits this layer is
supposed to provide. It is separate from the kernel not to lock anyone
into a
proprietary design or prevent anyone from distributing a working
kernel. It is
separate to allow the hypervisor backend to change radically without
introducing any changes whatsoever into the kernel. This is absolutely
required for future compatibility - with new versions of each hypervisor
being
released regularly, and new competing hypervisors emerging, it is a
necessity.
This allows the hypervisor development process, as well as the Linux kernel
development process, to continue unimpeded in the face of rapid change
on each
side. Having an open binary interface encourages growth and competition in
this area, rather than locking anyone into a proprietary design. It
also does
not stop anyone from distributing a working, fully source compiled
kernel in
any circumstance I can imagine. If you don't have the firmware for a
VE-10TB
network card compiled into your kernel, but also don't have a VE-10TB
network
card, you haven't been deprived of anything, and have very little to
complain
about. Provided, of course, that when you do buy a VE-10TB network
card, it
happily provides the required firmware to you at boot time.

On the other hand, the GPL is not friendly to this type of linking against
binary fragments that come from firmware. But they really, absolutely,
must be
separate from the kernel. There is no argument against this from a feature
point of view. But there is also no reason that they must be
binary-only. The
interface between the two components surely must be binary - just as the
interface between userspace and the kernel, or between the apps and
glibc must
be binary. This means the code from one layer is visible to the other
purely
as a binary "blob". But not an evil one. And by NO circumstances, is it
required to be a CLOSED source binary blob. In fact, why can't it be
open? In
the event of a firmware bug, in fact, it is very desirable to have this
software be open so that it can be fixed - and reflashed onto the card,
where
the firmware belongs.

Let me illustrate the major differences between an "evil" binary blob, a
typical vendor designed hardware support layer, a well designed, open
binary
interface, and a traditional ROM or extensible firmware layer. I think you
will see why our VMI layer is quite similar to a traditional ROM, and very
dissimilar to an evil GPL-circumvention device. I can't truly speak for
the
video card vendors who have large binary modules in the kernel, but I would
imagine I'm not far off in my guesses, and they can correct me if I am
wrong.

EVIL VENDOR VMI ROM
Module runs at kernel privilege level: YES YES YES
MAYBE (*)
Module is copied into the kernel: YES MAYBE NO NO
Module is part of kernel address space: YES YES NO(+) ??
Module has hooks back into kernel: YES MAYBE NO NO
Kernel has hooks into module: YES YES YES YES
Module has proprietary 3rd party code: MAYBE MAYBE(?) NO YES
Module has significant software IP: YES MAYBE(?) NO
MAYBE (?)
Module is open source: NO MAYBE MAYBE NO

(*) In the ROM case, sometimes these are run in V8086 mode, not at full
hardware privilege level, and whether the < 1MB physical ROM region is
part of
the "kernel" address space proper, or just happens to appear in kernel
address
space as a result of linear mapping of physical space is a debatable
matter.

(+) The VMI layer is not technically part of the kernel address space.
It is
never mapped by the kernel, and merely exists magically hanging out in
virtual
address space above the top of the fixmap, in hypervisor address space.
But it
can be read and called into by the kernel, so whether this constitutes
being
part of the same address space is a dicey matter of precise definition.
I would
propose that only supervisor level pages that are allocated, mapped and
controlled by the kernel constitute the kernel address space, or
alternately,
the kernel address space consists of the linear range of virtual address
space
for which it can create supervisor-level mappings.

(?) There are only two reasonable objections I can see to open sourcing the
binary layer. One is revealing IP by letting people see the code. This is
really a selfish concern, not a justification for keeping the code
binary only,
while still allowing it the privilege of running in the kernel address
space.
The other objection I see is if that code has 3rd party pieces in it
that are
unable to be licensed under an open software license. This really is a
hard
stopper for open sourcing the code, as the vendor doesn't actually own the
copyright, and thus can't redistribute that code under a different
license. We
don't have any such restrictions, as we wrote all our code ourselves,
but many
ROMs and firmware layers do have such problems. ROMs might also have
some IP
in the form of trade secrets protecting power management or other
features run
in SMM mode - but this is just a guess. And IANAL - so take all this
with a
grain of salt, it is purely my uninformed personal opinion.

This brings me to the essence of my proposal - why not allow binary
"blobs" to
be linked inside Linux, in exactly the same way modules are linked? If
these
binary modules are actually open sourced, but separate from the kernel, is
there no reason they can't actually link against GPL symbols within Linux?
What if these modules exposed an ELF header which had exactly the same
information as a typical module? In this case, kernel version is not
relevant,
as these modules are truly kernel independent, but device/module version
is.
The biggest issue is that the source and build environment to these
modules is
not the standard environment -- in fact many of these binary modules
might have
extremely bizarre build requirements. We certainly do. But still there
remains no reason that a well encapsulated build environment and open
source
distribution of these modules cannot exist. We are actively working on
this for
our VMI layer. Perhaps a good solution to this problem would be to
provide a
link embedded in the binary which points to a URL where this environment
can be
downloaded and built - or even fully buildable compressed source within the
binary itself for most friendly binaries with plenty of space to burn.
There may be other issues which I may not be aware of on our end, but that
has no bearing on finding out what the Linux and open source community
wants.

I propose this as a solution because I would like to see binary (only)
blobs go
away, and I would never again like to see hardware vendors design stupid
code
which relies on firmware in the operating system to initialize a hardware
device (can I say WinModem?) which is not published and open code. The
point
of an interface like this is to open and standardize things in a way that
vendors can benefit from a hardware abstraction layer, and to make sure
that
the GPL is not violated in the process of doing so. I would very much
like to
see Linux come up with a long term proposal that can accommodate open
firmware
which actually runs in kernel mode, while at the same time assuring that
this
code is authorized within the license boundaries afforded to it and
available
for use by any operating system.

Thank you very much for your patience - I can get verbose, and I've already
gone too long. This is still a very controversial issue, and I wanted to
clarify several points that merit attention before dooming myself to
enduring
the yet to come flames. Again, I must say IANAL, and I can't promise
that VMware or any other hardware vendor that wants to use a binary
interface will agree to everything I have proposed. But I would like to
open the issue for discussion and get feedback from the open source
community
on this issue, which I think will become more important in the future.

Zach


2006-08-03 11:17:20

by Arjan van de Ven

[permalink] [raw]
Subject: Re: A proposal - binary

On Thu, 2006-08-03 at 03:14 -0700, Zachary Amsden wrote:
> I would like to propose an interface for linking a GPL kernel,
> specifically,
> Linux, against binary blobs. Stop. Before you condemn this as evil,
> let me
> explain in precise detail what a binary blob is.


Hi,

you use a lot of words for saying something self contradictory. It's
very simple; based on your mail, there's no reason the VMI gateway page
can't be (also) GPL licensed (you're more than free obviously to dual,
tripple or quadruple license it). Once your gateway thing is gpl
licensed, your entire proposal is moot in the sense that there is no
issue on the license front. See: it can be very easy. Much easier than
trying to get a license exception (which is very unlikely you'll get)...


Now you can argue for hours about if such an interface is desirable or
not, but I didn't think your email was about that.

Greetings,
Arjan van de Ven

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com

2006-08-03 12:16:48

by Antonio Vargas

[permalink] [raw]
Subject: Re: A proposal - binary

On 8/3/06, Arjan van de Ven <[email protected]> wrote:
> On Thu, 2006-08-03 at 03:14 -0700, Zachary Amsden wrote:
> > I would like to propose an interface for linking a GPL kernel,
> > specifically,
> > Linux, against binary blobs. Stop. Before you condemn this as evil,
> > let me
> > explain in precise detail what a binary blob is.
>
>
> Hi,
>
> you use a lot of words for saying something self contradictory. It's
> very simple; based on your mail, there's no reason the VMI gateway page
> can't be (also) GPL licensed (you're more than free obviously to dual,
> tripple or quadruple license it). Once your gateway thing is gpl
> licensed, your entire proposal is moot in the sense that there is no
> issue on the license front. See: it can be very easy. Much easier than
> trying to get a license exception (which is very unlikely you'll get)...
>
>
> Now you can argue for hours about if such an interface is desirable or
> not, but I didn't think your email was about that.
>
> Greetings,
> Arjan van de Ven
>
> --
> if you want to mail me at work (you don't), use arjan (at) linux.intel.com

If the essence of using virtual machines is precisely that the machine
acts just as if it was a real hardware one, then we should not need
any modifications to the kernel. So, it would be much better if the
hypervirsor was completely transparent and just emulated a native cpu
and a common native set of hardware, which would then work 100% with
the native code in the kernel. This keeps the smarts of virtual
machine management on the hypervisor.

For example, TBL and pagetable handling can be done with 2 interfaces,
one standard via intercepting normal cpu instructions, and a batched
one via a hardware driver with a FIFO on shared memory just like many
graphics card do to send commands and data to the GPU. I recall this
design was the one used in the mac-on-linux hypervisor for ppc
architecture. Why not for x86 with vt/pacifica extensions? What about
using the same design than on the Sparc T1 port?

--
Greetz, Antonio Vargas aka winden of network

http://network.amigascne.org/
[email protected]
[email protected]

Every day, every year
you have to work
you have to study
you have to scene.

2006-08-03 13:02:11

by Alan

[permalink] [raw]
Subject: Re: A proposal - binary

So how does this differ from the twice yearly recycling of the fixed
driver ABI discussion ?

We have a facility for loading binary blobs into the kernel built from
source, its called insmod.

Alan

2006-08-03 15:18:11

by Rik van Riel

[permalink] [raw]
Subject: Re: A proposal - binary

Antonio Vargas wrote:

> If the essence of using virtual machines is precisely that the machine
> acts just as if it was a real hardware one, then we should not need
> any modifications to the kernel. So, it would be much better if the
> hypervirsor was completely transparent and just emulated a native cpu
> and a common native set of hardware,

That's not a good idea if you like performance.

Paravirtualization makes a lot of sense.

--
All Rights Reversed

2006-08-03 15:35:15

by Rik van Riel

[permalink] [raw]
Subject: Re: A proposal - binary

Zachary Amsden wrote:

> And by NO circumstances, is it required to be a CLOSED source binary
> blob. In fact, why can't it be open? In the event of a firmware bug,
> in fact, it is very desirable to have this software be open so that
> it can be fixed

You're making a very good argument as to why we should probably
require that the code linking against such an interface, if we
decide we want one, should be required to be open source.

> I think you will see why our VMI layer is quite similar to a
> traditional ROM, and very dissimilar to an evil GPL-circumvention
> device.

> (?) There are only two reasonable objections I can see to open
> sourcing the binary layer.

Since none of the vendors that might use such a paravirtualized
ROM for Linux actually have one of these reasons for keeping their
paravirtualized ROM blob closed source, I say we might as well
require that it be open source.

As for the evilness of a binary interface - the interface between
kernel and userland is a stable binary interface and is decidedly
non-evil. I could see a similar use for a stable paravirtualization
interface, to make compatibility between Linux and various hypervisor
versions easier.

As long as it's open source so the thing can be debugged :)

--
All Rights Reversed

2006-08-03 16:03:47

by Chris Wright

[permalink] [raw]
Subject: Re: A proposal - binary

* Antonio Vargas ([email protected]) wrote:
> If the essence of using virtual machines is precisely that the machine
> acts just as if it was a real hardware one, then we should not need
> any modifications to the kernel. So, it would be much better if the
> hypervirsor was completely transparent and just emulated a native cpu
> and a common native set of hardware, which would then work 100% with
> the native code in the kernel. This keeps the smarts of virtual
> machine management on the hypervisor.

You have missed the point of paravirtualizing x86.
-chris

2006-08-03 17:57:58

by Zachary Amsden

[permalink] [raw]
Subject: Re: A proposal - binary

Antonio Vargas wrote:
> If the essence of using virtual machines is precisely that the machine
> acts just as if it was a real hardware one, then we should not need
> any modifications to the kernel. So, it would be much better if the
> hypervirsor was completely transparent and just emulated a native cpu
> and a common native set of hardware, which would then work 100% with
> the native code in the kernel. This keeps the smarts of virtual
> machine management on the hypervisor.

You are basically arguing for full virtualization - which is fine. But
today as it stands it does not provide the highest level of performance
that paravirtualization does, and in the future, it does little to
provide more advanced virtualization features.

>
> For example, TBL and pagetable handling can be done with 2 interfaces,
> one standard via intercepting normal cpu instructions, and a batched
> one via a hardware driver with a FIFO on shared memory just like many
> graphics card do to send commands and data to the GPU. I recall this
> design was the one used in the mac-on-linux hypervisor for ppc
> architecture. Why not for x86 with vt/pacifica extensions? What about
> using the same design than on the Sparc T1 port?

You can't use a driver to do this in Linux today, because there are no
hooks you can use for pagetable handling. And you will always achieve
better performance and simplicity by changing the machine definition to
avoid the really nasty cases. Hardware virtualization is simply not
fast enough today. But it also doesn't leave room for the future -
proposals such as the abstract MMU interfaces for Linux which have been
floating around are extremely attractive from a hypervisor point of view
- but there can be no progress until there is some kind of consensus on
what those are, and having an interface in the kernel is a requirement
for any deeper level of paravirtualization.

Zach

2006-08-03 18:08:06

by Zachary Amsden

[permalink] [raw]
Subject: Re: A proposal - binary

Arjan van de Ven wrote:
> Hi,
>
> you use a lot of words for saying something self contradictory. It's
> very simple; based on your mail, there's no reason the VMI gateway page
> can't be (also) GPL licensed (you're more than free obviously to dual,
> tripple or quadruple license it). Once your gateway thing is gpl
> licensed, your entire proposal is moot in the sense that there is no
> issue on the license front. See: it can be very easy. Much easier than
> trying to get a license exception (which is very unlikely you'll get)...
>
>
> Now you can argue for hours about if such an interface is desirable or
> not, but I didn't think your email was about that.
>

Arjan, thank you for reading my prolific manifesto. I am not arguing
for the interface being desirable, and I don't think I'm being self
contradictory. There was some confusion over technical details of the
VMI gateway page that I wanted to make explicit. Hopefully I have fully
explained those. I'm not trying to get a license exemption, I'm trying
to come up with a model that current and future hardware vendors can
follow when faced with the same set of circumstances.

It was not 100% clear based on conversations at OLS that open-sourcing
the VMI layer met the letter and intent of the kernel license model.
There were some arguments that not having the source integrated into the
kernel violated the spirit of the GPL by not allowing one to distribute
a fully working kernel. I wanted to show that is not true, and the
situation is actually quite unique. Perhaps we can use this to
encourage open sourced firmware layers, instead of trying to ban drivers
which rely on firmware from the kernel.

Zach

2006-08-03 18:29:36

by Antonio Vargas

[permalink] [raw]
Subject: Re: A proposal - binary

On 8/3/06, Zachary Amsden <[email protected]> wrote:
> Antonio Vargas wrote:
> > If the essence of using virtual machines is precisely that the machine
> > acts just as if it was a real hardware one, then we should not need
> > any modifications to the kernel. So, it would be much better if the
> > hypervirsor was completely transparent and just emulated a native cpu
> > and a common native set of hardware, which would then work 100% with
> > the native code in the kernel. This keeps the smarts of virtual
> > machine management on the hypervisor.
>
> You are basically arguing for full virtualization - which is fine. But
> today as it stands it does not provide the highest level of performance
> that paravirtualization does, and in the future, it does little to
> provide more advanced virtualization features.

I realise now that I missed mentioning my point. What I envision as a
stable binary interface for comunication between the kernel and the
hypervisor is exactly the current situation that goes into the
hypervisor when the kernel does any priviledged instruction. I
understand that paravirtualization gives a very good speed boost
(within 5% of native speed IIRC?), but I was also wondering about the
relative speed of running unmodified kernels.

> >
> > For example, TBL and pagetable handling can be done with 2 interfaces,
> > one standard via intercepting normal cpu instructions, and a batched
> > one via a hardware driver with a FIFO on shared memory just like many
> > graphics card do to send commands and data to the GPU. I recall this
> > design was the one used in the mac-on-linux hypervisor for ppc
> > architecture. Why not for x86 with vt/pacifica extensions? What about
> > using the same design than on the Sparc T1 port?
>
> You can't use a driver to do this in Linux today, because there are no
> hooks you can use for pagetable handling. And you will always achieve
> better performance and simplicity by changing the machine definition to
> avoid the really nasty cases. Hardware virtualization is simply not

Yes, I agree that doing it at the subarch level is good, so that
native subarch gets the original code and the hypervisored subarch
gets the modified one without messing with core kernel code.

> fast enough today. But it also doesn't leave room for the future -
> proposals such as the abstract MMU interfaces for Linux which have been
> floating around are extremely attractive from a hypervisor point of view

I've been fishing in my mail archive and was unable to get any
discussion about abstract mmu... do you know where I can get more info
on that?

> - but there can be no progress until there is some kind of consensus on
> what those are, and having an interface in the kernel is a requirement
> for any deeper level of paravirtualization.
>
> Zach

Here I'd like to say that I mentioned both mol and the sun T1 because
so far we haven't had any discussion on whether any of their
interfaces are worth copying for the x86 case. Also worth looking at
would be the work done by IBM for ppc64 and s390, especially the last
one is prone to be very optimised since their hypervisor work has been
proven to work for a very long time.

I sure don't mean to diss out both vmware and xen work on the field,
given the rocky nature of the x86 architecture, but maybe taking a
look at preexisting work can be a good idea if it hasn't been done
earlier.

--
Greetz, Antonio Vargas aka winden of network

2006-08-03 18:36:39

by Zachary Amsden

[permalink] [raw]
Subject: Re: A proposal - binary

Rik van Riel wrote:
> Zachary Amsden wrote:
>
>> And by NO circumstances, is it required to be a CLOSED source binary
>> blob. In fact, why can't it be open? In the event of a firmware bug,
>> in fact, it is very desirable to have this software be open so that
>> it can be fixed
>
> You're making a very good argument as to why we should probably
> require that the code linking against such an interface, if we
> decide we want one, should be required to be open source.

Personally, I don't feel a strong requirement that it be open source,
because I don't believe it violates the intent of the GPL license by
crippling free distribution of the kernel, requiring some fee for use,
or doing anything unethical. There have been charges that the VMI layer
is deliberately designed as a GPL circumvention device, which I want to
stamp out now before we try to get any code for integrating to it
upstreamed.


>> I think you will see why our VMI layer is quite similar to a
>> traditional ROM, and very dissimilar to an evil GPL-circumvention
>> device.
>
>> (?) There are only two reasonable objections I can see to open
>> sourcing the binary layer.
>
> Since none of the vendors that might use such a paravirtualized
> ROM for Linux actually have one of these reasons for keeping their
> paravirtualized ROM blob closed source, I say we might as well
> require that it be open source.

I think saying require at this point is a bit preliminary for us -- I'm
trying to prove we're not being evil and subverting the GPL, but I'm
also not guaranteeing yet that we can open-source the code under a
specific license. Sorry about having to doublespeak here - but we have
not yet got a green light to open source the VMI layer under the GPL.
Perhaps there are some other issues I haven't conceived of. We still
have some source separation issues with creating a build environment due
to entangled header files - that is being sorted out, but we're
certainly not ready to distribute an open source buildable VMI layer for
ESX today. I sincerely hope we will be very soon.

>
> As for the evilness of a binary interface - the interface between
> kernel and userland is a stable binary interface and is decidedly
> non-evil. I could see a similar use for a stable paravirtualization
> interface, to make compatibility between Linux and various hypervisor
> versions easier.
>
> As long as it's open source so the thing can be debugged :)

Unfortunately, inlining and patching code will break CFI debug
information! I haven't thought of a way to fix this yet other than
using frame pointers. At least the possibility of debugging exists.

Zach

2006-08-03 18:47:23

by Zachary Amsden

[permalink] [raw]
Subject: Re: A proposal - binary

Antonio Vargas wrote:
> I've been fishing in my mail archive and was unable to get any
> discussion about abstract mmu... do you know where I can get more info
> on that?

Here's one useful link:

http://lwn.net/Articles/124961/

>> - but there can be no progress until there is some kind of consensus on
>> what those are, and having an interface in the kernel is a requirement
>> for any deeper level of paravirtualization.
>>
>> Zach
>
> Here I'd like to say that I mentioned both mol and the sun T1 because
> so far we haven't had any discussion on whether any of their
> interfaces are worth copying for the x86 case. Also worth looking at
> would be the work done by IBM for ppc64 and s390, especially the last
> one is prone to be very optimised since their hypervisor work has been
> proven to work for a very long time.
>
> I sure don't mean to diss out both vmware and xen work on the field,
> given the rocky nature of the x86 architecture, but maybe taking a
> look at preexisting work can be a good idea if it hasn't been done
> earlier

Almost nothing from any other architecture makes sense for x86. X86 is
not a virtualizable architecture. It has both classical problems -
sensitive instructions, and also non-reversible CPU state. Hardware
virtualization is now making that easier, but simplifying the OS to
avoid these problems is actually simpler and more efficient. PPC64 and
S390 had the benefit of being designed with virtualization in mind, and
they still have "paravirtualized" kernel architectures when you look at
the lower layers.

Zach

2006-08-03 19:08:12

by Greg KH

[permalink] [raw]
Subject: Re: A proposal - binary

On Thu, Aug 03, 2006 at 11:08:04AM -0700, Zachary Amsden wrote:
> Perhaps we can use this to encourage open sourced firmware layers,
> instead of trying to ban drivers which rely on firmware from the
> kernel.

No one is trying to ban such drivers. Well, except the odd people on
debian-legal, but all the kernel developers know to ignore them :)

thanks,

greg k-h

2006-08-03 19:10:40

by Greg KH

[permalink] [raw]
Subject: Re: A proposal - binary

On Thu, Aug 03, 2006 at 03:14:21AM -0700, Zachary Amsden wrote:
> I would like to propose an interface for linking a GPL kernel,
> specifically, Linux, against binary blobs.

Sorry, but we aren't lawyers here, we are programmers. Do you have a
patch that shows what you are trying to describe here? Care to post it?

How does this differ with the way that the Xen developers are proposing?
Why haven't you worked with them to find a solution that everyone likes?

And what about Rusty's proposal that is supposed to be the "middle
ground" between the two competing camps? How does this differ from
that? Why don't you like Rusty's proposal?

Please, start posting code and work together with the other people that
are wanting to achive the same end goal as you are. That is what really
matters here.

thanks,

greg k-h

2006-08-03 19:14:58

by Zachary Amsden

[permalink] [raw]
Subject: Re: A proposal - binary

Greg KH wrote:
> On Thu, Aug 03, 2006 at 11:08:04AM -0700, Zachary Amsden wrote:
>
>> Perhaps we can use this to encourage open sourced firmware layers,
>> instead of trying to ban drivers which rely on firmware from the
>> kernel.
>>
>
> No one is trying to ban such drivers. Well, except the odd people on
> debian-legal, but all the kernel developers know to ignore them :)
>

That is good to know. But there is a kernel option which doesn't make
much sense in that case:

[*] Select only drivers that don't need compile-time external firmware


Zach

2006-08-03 19:26:18

by Zachary Amsden

[permalink] [raw]
Subject: Re: A proposal - binary

Greg KH wrote:
> On Thu, Aug 03, 2006 at 03:14:21AM -0700, Zachary Amsden wrote:
>
>> I would like to propose an interface for linking a GPL kernel,
>> specifically, Linux, against binary blobs.
>>
>
> Sorry, but we aren't lawyers here, we are programmers. Do you have a
> patch that shows what you are trying to describe here? Care to post it?
>

<Posts kernel/module.c unmodified>

> How does this differ with the way that the Xen developers are proposing?
> Why haven't you worked with them to find a solution that everyone likes?
>

We want our backend to provide a greater degree of stability than a pure
source level API as the Xen folks have proposed. We have tried to
convince them that an ABI is in their best interest, but they are
reluctant to commit to one or codesign one at this time.

> And what about Rusty's proposal that is supposed to be the "middle
> ground" between the two competing camps? How does this differ from
> that? Why don't you like Rusty's proposal?
>

Who said that? Please smack them on the head with a broom. We are all
actively working on implementing Rusty's paravirt-ops proposal. It
makes the API vs ABI discussion moot, as it allow for both.

> Please, start posting code and work together with the other people that
> are wanting to achive the same end goal as you are. That is what really
> matters here.
>

We have already started upstreaming patches. Jeremy, Rusty and I have
or will send out sets yesterday / today. We haven't been vocal on LKML,
as we'd just be adding noise. We are working with Rusty and the Xen
developers, and you can see our patchset here:

http://ozlabs.org/~rusty/paravirt/?cl=tip

And follow our development discussions here:

http://lists.osdl.org/pipermail/virtualization/

Zach

2006-08-03 19:40:41

by Greg KH

[permalink] [raw]
Subject: Re: A proposal - binary

On Thu, Aug 03, 2006 at 12:14:57PM -0700, Zachary Amsden wrote:
> Greg KH wrote:
> >On Thu, Aug 03, 2006 at 11:08:04AM -0700, Zachary Amsden wrote:
> >
> >>Perhaps we can use this to encourage open sourced firmware layers,
> >>instead of trying to ban drivers which rely on firmware from the
> >>kernel.
> >>
> >
> >No one is trying to ban such drivers. Well, except the odd people on
> >debian-legal, but all the kernel developers know to ignore them :)
> >
>
> That is good to know. But there is a kernel option which doesn't make
> much sense in that case:
>
> [*] Select only drivers that don't need compile-time external firmware

No, that is very different. That option is present if you don't want to
build some firmware images from the source that is present in the kernel
tree, and instead, use the pre-built stuff that is also present in the
kernel tree.

It is there so that we do not require some additional tools that the
majority of kernel developers do not have installed on their machine in
order to create a working kernel image for some types of hardware.

Hope this helps,

greg k-h

2006-08-03 19:49:00

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: A proposal - binary


On Thu, 3 Aug 2006, Zachary Amsden wrote:

> Arjan van de Ven wrote:
>> Hi,
>>
>> you use a lot of words for saying something self contradictory. It's
>> very simple; based on your mail, there's no reason the VMI gateway page
>> can't be (also) GPL licensed (you're more than free obviously to dual,
>> tripple or quadruple license it). Once your gateway thing is gpl
>> licensed, your entire proposal is moot in the sense that there is no
>> issue on the license front. See: it can be very easy. Much easier than
>> trying to get a license exception (which is very unlikely you'll get)...
>>
>>
>> Now you can argue for hours about if such an interface is desirable or
>> not, but I didn't think your email was about that.
>>
>
> Arjan, thank you for reading my prolific manifesto. I am not arguing
> for the interface being desirable, and I don't think I'm being self
> contradictory. There was some confusion over technical details of the
> VMI gateway page that I wanted to make explicit. Hopefully I have fully
> explained those. I'm not trying to get a license exemption, I'm trying
> to come up with a model that current and future hardware vendors can
> follow when faced with the same set of circumstances.
>
> It was not 100% clear based on conversations at OLS that open-sourcing
> the VMI layer met the letter and intent of the kernel license model.
> There were some arguments that not having the source integrated into the
> kernel violated the spirit of the GPL by not allowing one to distribute
> a fully working kernel. I wanted to show that is not true, and the
> situation is actually quite unique. Perhaps we can use this to
> encourage open sourced firmware layers, instead of trying to ban drivers
> which rely on firmware from the kernel.
>
> Zach

Inside the kernel there is no protection whatsoever. Whatever executes
inside the kernel, or within the kernel's 'lack-of-protection' ring,
commonly called ring 0, needs to be viewable to anybody who might
be chasing kernel malfunctions. Otherwise, one doesn't know if the
problem is a kernel bug or some accident within a binary blob.

The bottom line is that these problems have nothing to do with
licensing at all. It's not a GPL issue. It's a troubleshooting
issue. There already exists a method of using proprietary modules.
You just don't report "bugs" if they are installed. It's really
just that simple. If you want to use some secret recipe that
can't be seen my kernel troubleshooters, just don't use them
in a kernel that is being debugged. It's really that simple.

Time-and-time-again, I've seen bug-reports with proprietary
video drivers installed. The respondent says; "Duplicate the
problem without that module installed." The result is usually
that the problem can't be duplicated, to wit: no kernel bug,
it's a proprietary driver bug.

Also, again-and-again, I read about some "new" thing that
requires hooks for some binary blob. They come in various
disguises. Certainly kernel developers are smart enough
to see the Trojan Horse just outside the gates. Give it up!

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.24 on an i686 machine (5592.62 BogoMips).
New book: http://www.AbominableFirebug.com/
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-08-03 19:56:34

by Dave Jones

[permalink] [raw]
Subject: Re: A proposal - binary

On Thu, Aug 03, 2006 at 12:36:00PM -0700, Greg Kroah-Hartman wrote:

> > That is good to know. But there is a kernel option which doesn't make
> > much sense in that case:
> >
> > [*] Select only drivers that don't need compile-time external firmware
>
> No, that is very different. That option is present if you don't want to
> build some firmware images from the source that is present in the kernel
> tree, and instead, use the pre-built stuff that is also present in the
> kernel tree.

You're describing PREVENT_FIRMWARE_BUILD. The text Zach quoted is from
STANDALONE, which is something else completely. That allows us to not
build drivers that pull in things from /etc and the like during compile.
(Whoever thought that was a good idea?)

Dave

--
http://www.codemonkey.org.uk

2006-08-03 20:04:03

by Greg KH

[permalink] [raw]
Subject: Re: A proposal - binary

On Thu, Aug 03, 2006 at 03:56:17PM -0400, Dave Jones wrote:
> On Thu, Aug 03, 2006 at 12:36:00PM -0700, Greg Kroah-Hartman wrote:
>
> > > That is good to know. But there is a kernel option which doesn't make
> > > much sense in that case:
> > >
> > > [*] Select only drivers that don't need compile-time external firmware
> >
> > No, that is very different. That option is present if you don't want to
> > build some firmware images from the source that is present in the kernel
> > tree, and instead, use the pre-built stuff that is also present in the
> > kernel tree.
>
> You're describing PREVENT_FIRMWARE_BUILD. The text Zach quoted is from
> STANDALONE, which is something else completely. That allows us to not
> build drivers that pull in things from /etc and the like during compile.
> (Whoever thought that was a good idea?)

Oops, sorry, you are right. Yeah, some of those alsa driver look for
files in /etc when building, very strange...

Either way it's not what I think Zach was thinking it was about.

thanks,

greg k-h

2006-08-03 20:06:11

by Greg KH

[permalink] [raw]
Subject: Re: A proposal - binary

On Thu, Aug 03, 2006 at 12:26:16PM -0700, Zachary Amsden wrote:
> Greg KH wrote:
> >On Thu, Aug 03, 2006 at 03:14:21AM -0700, Zachary Amsden wrote:
> >
> >>I would like to propose an interface for linking a GPL kernel,
> >>specifically, Linux, against binary blobs.
> >>
> >
> >Sorry, but we aren't lawyers here, we are programmers. Do you have a
> >patch that shows what you are trying to describe here? Care to post it?
> >
>
> <Posts kernel/module.c unmodified>

If you want to stick with the current kernel module interface, I don't
see why you even need to bring this up, there are no arguments about
that API being in constant flux :)

> >How does this differ with the way that the Xen developers are proposing?
> >Why haven't you worked with them to find a solution that everyone likes?
> >
>
> We want our backend to provide a greater degree of stability than a pure
> source level API as the Xen folks have proposed. We have tried to
> convince them that an ABI is in their best interest, but they are
> reluctant to commit to one or codesign one at this time.

Don't you feel it's a bit early to "commit" to anything yet when we
don't have a working implementation? Things change over time, and it's
one of the main reasons Linux is so successful.

> >And what about Rusty's proposal that is supposed to be the "middle
> >ground" between the two competing camps? How does this differ from
> >that? Why don't you like Rusty's proposal?
> >
>
> Who said that? Please smack them on the head with a broom. We are all
> actively working on implementing Rusty's paravirt-ops proposal. It
> makes the API vs ABI discussion moot, as it allow for both.

So everyone is still skirting the issue, oh great :)

> >Please, start posting code and work together with the other people that
> >are wanting to achive the same end goal as you are. That is what really
> >matters here.
> >
>
> We have already started upstreaming patches. Jeremy, Rusty and I have
> or will send out sets yesterday / today. We haven't been vocal on LKML,
> as we'd just be adding noise. We are working with Rusty and the Xen
> developers, and you can see our patchset here:
>
> http://ozlabs.org/~rusty/paravirt/?cl=tip
>
> And follow our development discussions here:
>
> http://lists.osdl.org/pipermail/virtualization/

I really don't want to follow the discussion unless necessary. I trust
Chris and Rusty to do the right thing in this area...

thanks,

greg k-h

2006-08-03 20:25:47

by Adrian Bunk

[permalink] [raw]
Subject: Options depending on STANDALONE

On Thu, Aug 03, 2006 at 03:56:17PM -0400, Dave Jones wrote:
> On Thu, Aug 03, 2006 at 12:36:00PM -0700, Greg Kroah-Hartman wrote:
>
> > > That is good to know. But there is a kernel option which doesn't make
> > > much sense in that case:
> > >
> > > [*] Select only drivers that don't need compile-time external firmware
> >
> > No, that is very different. That option is present if you don't want to
> > build some firmware images from the source that is present in the kernel
> > tree, and instead, use the pre-built stuff that is also present in the
> > kernel tree.
>
> You're describing PREVENT_FIRMWARE_BUILD. The text Zach quoted is from
> STANDALONE, which is something else completely. That allows us to not
> build drivers that pull in things from /etc and the like during compile.
> (Whoever thought that was a good idea?)

We should also look at what drivers do depend on STANDALONE:
- some OSS drivers
- one DVB driver option (DVB_AV7110_FIRMWARE)
- ACPI_CUSTOM_DSDT

The OSS drivers are more or less RIP, so let's ignore them.

Is DVB_AV7110_FIRMWARE really still required?
ALL other drivers work without such an option.

ACPI_CUSTOM_DSDT seems to be the most interesting case.
It's anyway not usable for distribution kernels, and AFAIR the ACPI
people prefer to get the kernel working with all original DSDTs
(which usually work with at least one other OS) than letting the people
workaround the problem by using a custom DSDT.

It might therefore be possile simply getting rid of CONFIG_STANDALONE?

> Dave

cu
Adrian

--

Gentoo kernels are 42 times more popular than SUSE kernels among
KLive users (a service by SUSE contractor Andrea Arcangeli that
gathers data about kernels from many users worldwide).

There are three kinds of lies: Lies, Damn Lies, and Statistics.
Benjamin Disraeli

2006-08-03 20:32:54

by Greg KH

[permalink] [raw]
Subject: Re: Options depending on STANDALONE

On Thu, Aug 03, 2006 at 10:25:43PM +0200, Adrian Bunk wrote:
> ACPI_CUSTOM_DSDT seems to be the most interesting case.
> It's anyway not usable for distribution kernels, and AFAIR the ACPI
> people prefer to get the kernel working with all original DSDTs
> (which usually work with at least one other OS) than letting the people
> workaround the problem by using a custom DSDT.

Not true at all. For SuSE kernels, we have a patch that lets people
load a new DSDT from initramfs due to broken machines requiring a
replacement in order to work properly.

thanks,

greg k-h

2006-08-03 20:38:57

by Willy Tarreau

[permalink] [raw]
Subject: Re: A proposal - binary

On Thu, Aug 03, 2006 at 02:21:12PM +0100, Alan Cox wrote:
> So how does this differ from the twice yearly recycling of the fixed
> driver ABI discussion ?
>
> We have a facility for loading binary blobs into the kernel built from
> source, its called insmod.

I think that the issue Zach tried to cover is the current inability to
keep the same binary module across multiple kernel versions. That's why
he compared modules<->kernel to ELF<->glibc. In that sense, he's right.
I'm very happy when I find that old binaries I built with gcc-2.7.2 in
1997 still run under my glibc-2.3.6 without any need to rebuild (and
potentially rebuild gcc-2.7.2 first).

> Alan

Willy

2006-08-03 20:42:57

by Dave Jones

[permalink] [raw]
Subject: Re: Options depending on STANDALONE

On Thu, Aug 03, 2006 at 01:28:07PM -0700, Greg Kroah-Hartman wrote:
> On Thu, Aug 03, 2006 at 10:25:43PM +0200, Adrian Bunk wrote:
> > ACPI_CUSTOM_DSDT seems to be the most interesting case.
> > It's anyway not usable for distribution kernels, and AFAIR the ACPI
> > people prefer to get the kernel working with all original DSDTs
> > (which usually work with at least one other OS) than letting the people
> > workaround the problem by using a custom DSDT.
>
> Not true at all. For SuSE kernels, we have a patch that lets people
> load a new DSDT from initramfs due to broken machines requiring a
> replacement in order to work properly.

Whilst this is a quick fix for users who either know how to hack DSDTs
themselves, or know where to get a fixed one, it doesn't solve the bigger
problem, that the interpretor doesn't get fixed.
And by 'fixed', I mean we aren't bug for bug compatible with that
other OS. We need to be adding workarounds to the ACPI interpretor
so this stuff 'just works', not hiding from the problem and creating
"but it works in $otherdistro when I do this" scenarios.

Dave

--
http://www.codemonkey.org.uk

2006-08-03 20:44:25

by Alan

[permalink] [raw]
Subject: Re: A proposal - binary

Ar Iau, 2006-08-03 am 11:08 -0700, ysgrifennodd Zachary Amsden:
> encourage open sourced firmware layers, instead of trying to ban drivers
> which rely on firmware from the kernel.

The reasons for pushing downloadable firmware out of the kernel are
manyfold and based on legal advice.

MORAL: Many free software people like a clean separation between the
free and non-free components of a system

LEGAL: Some firmware isn't publically redistributable but comes with the
h/w

LEGAL: Several lawyers have advised people that putting firmware
separate to the kernel is different to embedding it in kernel in terms
of the derivative work question.

TECHNICAL: Unswappable blobs of kernel memory taken up by firmware is
bad generally speaking

TECHNICAL: Pulling 20Mb of unchanging firmware each kernel tree is
annoying


2006-08-03 20:49:54

by Brown, Len

[permalink] [raw]
Subject: RE: Options depending on STANDALONE

>On Thu, Aug 03, 2006 at 10:25:43PM +0200, Adrian Bunk wrote:
>> ACPI_CUSTOM_DSDT seems to be the most interesting case.
>> It's anyway not usable for distribution kernels, and AFAIR the ACPI
>> people prefer to get the kernel working with all original DSDTs
>> (which usually work with at least one other OS) than letting
>> the people workaround the problem by using a custom DSDT.
>
>Not true at all. For SuSE kernels, we have a patch that lets people
>load a new DSDT from initramfs due to broken machines requiring a
>replacement in order to work properly.

CONFIG_ACPI_CUSTOM_DSDT allows hackers to debug their system
by building a modified DSDT into the kernel to over-ride what
came with the system. It would make no sense for a distro
to use it, unless the distro were shipping only on 1 model machine.
This technique is necessary for debugging, but makes no
sense for production.

The initramfs method shipped by SuSE is more flexible, allowing
the hacker to stick the DSDT image in the initrd and use it
without re-compiling the kernel.

I have refused to accept the initrd patch into Linux many times,
and always will.

I've advised SuSE many times that they should not be shipping it,
as it means that their supported OS is running on modified firmware --
which, by definition, they can not support. Indeed, one could view
this method as couter-productive to the evolution of Linux --
since it is our stated goal to run on the same machines that Windows
runs on -- without requiring customers to modify those machines
to run Linux.

-Len

2006-08-03 20:53:34

by Alan

[permalink] [raw]
Subject: Re: A proposal - binary

Ar Iau, 2006-08-03 am 22:29 +0200, ysgrifennodd Willy Tarreau:
> I think that the issue Zach tried to cover is the current inability to
> keep the same binary module across multiple kernel versions. That's why
> he compared modules<->kernel to ELF<->glibc. In that sense, he's right.

I think thats why he's wrong.

The interface for a hypedvisor is

Kernel -> Something -> Hypedvisor

The kernel->something interface can change randomly by day of week, who
cares. A better analogy would be a device driver - we recompile device
drivers each kernel variant, which change their internal interfaces, we
redesign their locking but we don't have to change the hardware.

Ditto talking to the hypedvisor. The ABI is the hypedvisor syscall/trap
interface not the kernel module interface. As such insmod is just fine.

Alan

2006-08-03 20:56:08

by Greg KH

[permalink] [raw]
Subject: Re: Options depending on STANDALONE

On Thu, Aug 03, 2006 at 04:49:08PM -0400, Brown, Len wrote:
> I've advised SuSE many times that they should not be shipping it,
> as it means that their supported OS is running on modified firmware --
> which, by definition, they can not support. Indeed, one could view
> this method as couter-productive to the evolution of Linux --
> since it is our stated goal to run on the same machines that Windows
> runs on -- without requiring customers to modify those machines
> to run Linux.

Ok, if it's your position that we should not support this, I'll see what
I can do to remove it from our kernel tree...

If there are any other patches that we are carrying that you (or anyone
else) feel we should not be, please let me know.

thanks,

greg k-h

2006-08-03 21:03:53

by Dave Jones

[permalink] [raw]
Subject: Re: Options depending on STANDALONE

On Thu, Aug 03, 2006 at 01:51:27PM -0700, Greg Kroah-Hartman wrote:
> On Thu, Aug 03, 2006 at 04:49:08PM -0400, Brown, Len wrote:
> > I've advised SuSE many times that they should not be shipping it,
> > as it means that their supported OS is running on modified firmware --
> > which, by definition, they can not support. Indeed, one could view
> > this method as couter-productive to the evolution of Linux --
> > since it is our stated goal to run on the same machines that Windows
> > runs on -- without requiring customers to modify those machines
> > to run Linux.
>
> Ok, if it's your position that we should not support this, I'll see what
> I can do to remove it from our kernel tree...
>
> If there are any other patches that we are carrying that you (or anyone
> else) feel we should not be, please let me know.

It's somewhat hard to tell when the source rpm's don't match the binaries.
See ftp://ftp.suse.com/pub/projects/kernel/kotd/x86_64/HEAD for example,
and notice the lack of 2.6.18rc3 source, just 2.6.16. Or am I looking
in the wrong place ? (The other arch's all seem to suffer this curious problem).

Dave




--
http://www.codemonkey.org.uk

2006-08-03 21:27:28

by Zachary Amsden

[permalink] [raw]
Subject: Re: A proposal - binary

Alan Cox wrote:
> Ar Iau, 2006-08-03 am 22:29 +0200, ysgrifennodd Willy Tarreau:
>
>> I think that the issue Zach tried to cover is the current inability to
>> keep the same binary module across multiple kernel versions. That's why
>> he compared modules<->kernel to ELF<->glibc. In that sense, he's right.
>>
>
> I think thats why he's wrong.
>
> The interface for a hypedvisor is
>
> Kernel -> Something -> Hypedvisor
>
> The kernel->something interface can change randomly by day of week, who
> cares. A better analogy would be a device driver - we recompile device
> drivers each kernel variant, which change their internal interfaces, we
> redesign their locking but we don't have to change the hardware.
>
> Ditto talking to the hypedvisor. The ABI is the hypedvisor syscall/trap
> interface not the kernel module interface. As such insmod is just fine.
>

Yes, the module issue is completely tangential. We would like to have
the ability to load a hypervisor module at run-time, and this may be
slightly nicer from a GPL point of view, by allowing us to publish a GPL
module that interfaces to the kernel. But the Something layer really is
more like firmware, and merely making a GPL'd module interface to it
doesn't actually change the underlying legal / technical ramifications
that Alan pointed out.

Zach

2006-08-03 21:41:28

by Zachary Amsden

[permalink] [raw]
Subject: Re: A proposal - binary

Greg KH wrote:
> On Thu, Aug 03, 2006 at 12:26:16PM -0700, Zachary Amsden wrote:
>
>> Greg KH wrote:
>>
>>> Sorry, but we aren't lawyers here, we are programmers. Do you have a
>>> patch that shows what you are trying to describe here? Care to post it?
>>>
>>>
>> <Posts kernel/module.c unmodified>
>>
>
> If you want to stick with the current kernel module interface, I don't
> see why you even need to bring this up, there are no arguments about
> that API being in constant flux :)
>

Hence my point follows. Using source compiled with the kernel as a
module does nothing to provide a stable interface to the backend
hardware / hypervisor implementation.

>
>>> How does this differ with the way that the Xen developers are proposing?
>>> Why haven't you worked with them to find a solution that everyone likes?
>>>
>>>
>> We want our backend to provide a greater degree of stability than a pure
>> source level API as the Xen folks have proposed. We have tried to
>> convince them that an ABI is in their best interest, but they are
>> reluctant to commit to one or codesign one at this time.
>>
>
> Don't you feel it's a bit early to "commit" to anything yet when we
> don't have a working implementation? Things change over time, and it's
> one of the main reasons Linux is so successful.
>

We have a working implementation of an ABI that interfaces to both ESX
and Xen. I have no argument that things change over time, and that
helps Linux to be successful and adaptive. But the fact that things
change so much over time is the problem. Distributing a hypervisor that
runs only one particular version of some hacked up kernel is almost
useless from a commercial standpoint. Most end users get their kernels
from some distro, and these kernels have a long lifetime, while the
release cycle for an entire OS distribution is getting much longer.
During that time, the hypervisor and the kernel will diverge. If is not
a question of if - it is a question of how much. A well designed ABI
slows that divergence to a pace that allows some hope of compatibility.
An ad-hoc source layer compatibility does not.


>>> And what about Rusty's proposal that is supposed to be the "middle
>>> ground" between the two competing camps? How does this differ from
>>> that? Why don't you like Rusty's proposal?
>>>
>>>
>> Who said that? Please smack them on the head with a broom. We are all
>> actively working on implementing Rusty's paravirt-ops proposal. It
>> makes the API vs ABI discussion moot, as it allow for both.
>>
>
> So everyone is still skirting the issue, oh great :)
>

There are many nice things about Rusty's proposal that allow for a
simpler and cleaner interface to Linux - for example, getting rid of the
need to have a special hypervisor sub-architecture, and allowing
non-performance critical operations to be indirected for flexibility of
implementation.

The Xen ABI as it stands today is not cleanly done - it assumes too many
details about how the hypervisor will operate, and is purely a
hypervisor specific ABI. We and other vendors would have to bend over
backwards and jump through flaming hoops while holding our breath in a
cloud of sulfurous gas to implement that interface. And there is no
guarantee that the interface will remain stable and compatible. So it
is really a non-starter.

VMI as it stands today is hypervisor independent, and provides a much
more stable and compatible interface. Although I believe it is possible
that it could work for other vendors than just VMware and Xen, those
other vendors could have their own, proprietary, single purpose ABI that
is either deliberately hypervisor specific or accidentally so from a
lack of forethought. <Zach removes soapbox>.

As you mention, Linux is adaptive and will change going forward.
Rusty's proposal allows a way to accommodate that change until such a
time as a true vendor independent hypervisor agnostic ABI evolves.
Hopefully in time it will - but that is not the case today, despite our
sincere efforts to make it happen. In fairness, we could have been more
public and forthcoming about our interface, but we also have not
received significant cooperation or collaboration on our efforts until
the work on paravirt-ops began. With the code coming into public
scrutiny and the goal of working together, perhaps our model can serve
as a blueprint, or at least a rough draft of what a long term stable ABI
will look like. But that is neither here nor there until the code is
working and ready to go. Paravirt-ops provides a nice house for this -
it lets us all speak the same language in Linux, even if we have to
phone our hypervisor in Sanskrit and Xen speaks in Latin. Creating a
common lingua franca is still an excellent goal, but we can't predict
the future. Hopefully, nobody starts using smoke signals. In the end,
paravirt-ops allows us all to play on the same field, and either a
unified hypervisor independent solution will win, or the hypervisor
interfaces will fragment, and Linux will still have an interface that
allows it to run on multiple hypervisors. Either way, Linux gets more
functionality and better performance in more environments, which is the
real win.

Zach

2006-08-03 21:46:40

by Greg KH

[permalink] [raw]
Subject: Re: Options depending on STANDALONE

On Thu, Aug 03, 2006 at 05:01:30PM -0400, Dave Jones wrote:
> On Thu, Aug 03, 2006 at 01:51:27PM -0700, Greg Kroah-Hartman wrote:
> > On Thu, Aug 03, 2006 at 04:49:08PM -0400, Brown, Len wrote:
> > > I've advised SuSE many times that they should not be shipping it,
> > > as it means that their supported OS is running on modified firmware --
> > > which, by definition, they can not support. Indeed, one could view
> > > this method as couter-productive to the evolution of Linux --
> > > since it is our stated goal to run on the same machines that Windows
> > > runs on -- without requiring customers to modify those machines
> > > to run Linux.
> >
> > Ok, if it's your position that we should not support this, I'll see what
> > I can do to remove it from our kernel tree...
> >
> > If there are any other patches that we are carrying that you (or anyone
> > else) feel we should not be, please let me know.
>
> It's somewhat hard to tell when the source rpm's don't match the binaries.
> See ftp://ftp.suse.com/pub/projects/kernel/kotd/x86_64/HEAD for example,
> and notice the lack of 2.6.18rc3 source, just 2.6.16. Or am I looking
> in the wrong place ? (The other arch's all seem to suffer this curious problem).

Bleah, our KOTD build is still broken...

We do have a 2.6.18rc3 kernel, and everything rebased on that, but it's
not getting out to the world just yet for some odd reason. It will show
up in the next Opensuse 10.2 Alpha release some time next week, but it
should be mirroring nightly too.

/me goes off to kick the build system

thanks,

greg k-h

2006-08-03 22:13:52

by Alan

[permalink] [raw]
Subject: Re: A proposal - binary

Ar Iau, 2006-08-03 am 14:41 -0700, ysgrifennodd Zachary Amsden:
> Hence my point follows. Using source compiled with the kernel as a
> module does nothing to provide a stable interface to the backend
> hardware / hypervisor implementation.

Could have fooled me. It seems to work for the IBM Mainframe people
really well.

2006-08-03 22:31:39

by Zachary Amsden

[permalink] [raw]
Subject: Re: A proposal - binary

Alan Cox wrote:
> Ar Iau, 2006-08-03 am 14:41 -0700, ysgrifennodd Zachary Amsden:
>
>> Hence my point follows. Using source compiled with the kernel as a
>> module does nothing to provide a stable interface to the backend
>> hardware / hypervisor implementation.
>>
>
> Could have fooled me. It seems to work for the IBM Mainframe people
> really well.
>

Yes, but not because of source compatibility. It works because the
hypervisor layer is actually architected in the hardware.

Zach

2006-08-03 22:35:17

by Greg KH

[permalink] [raw]
Subject: Re: A proposal - binary

On Thu, Aug 03, 2006 at 11:33:03PM +0100, Alan Cox wrote:
> Ar Iau, 2006-08-03 am 14:41 -0700, ysgrifennodd Zachary Amsden:
> > Hence my point follows. Using source compiled with the kernel as a
> > module does nothing to provide a stable interface to the backend
> > hardware / hypervisor implementation.
>
> Could have fooled me. It seems to work for the IBM Mainframe people
> really well.

And the PowerPC hypervisor interface :)

Have you discussed this with those two groups to make sure you aren't
doing something that would merely duplicate what they have already done?

thanks,

greg k-h

2006-08-03 22:50:18

by Zachary Amsden

[permalink] [raw]
Subject: Re: A proposal - binary

Greg KH wrote:
> And the PowerPC hypervisor interface :)
>
> Have you discussed this with those two groups to make sure you aren't
> doing something that would merely duplicate what they have already done?
>

I haven't personally.

There's nothing anybody has done that can be considered sufficient to
address virtualizing i386. Most of these other architectures have a
prom / lpar / hypervisor support layer already, which is what we are
trying to create for i386. And it is just about 100% architecture
specific because of the weird non-virtualizable parts of x86. Page
tables are completely different beasts when you are using a hashed page
table scheme versus hardware page tables, so there is not even common
ground in the MMU. About the only common ground would be the cede /
prod for remote notifications, but it is all so architecture dependent
that I really think any idea of creating a common cross architecture
hypervisor layer is just impossible at this time.

We need to focus on establishing that lower layer interface for i386
instead of trying to come up with the grand unified hypervisor
interface, which could be years away. For now, I think it's fair to say
there is about zero duplication, and any that we find along the way can
go into common Linux interfaces.

Zach

2006-08-03 23:11:19

by Alan

[permalink] [raw]
Subject: Re: A proposal - binary

Ar Iau, 2006-08-03 am 15:31 -0700, ysgrifennodd Zachary Amsden:
> Alan Cox wrote:
> > Could have fooled me. It seems to work for the IBM Mainframe people
> > really well.
>Yes, but not because of source compatibility. It works because the
> hypervisor layer is actually architected in the hardware.

The hardware has nothing to do with it. It works because the hypervisor
API has a spec and is maintained compatibly. Its not entirely hardware
architected either, it has chunks of interfaces that are not present
hardware level or not meaningful at that level - the paging assists for
example are purely a hypervisor interface as are hipersockets.

2006-08-03 23:40:27

by Trent Piepho

[permalink] [raw]
Subject: Re: [v4l-dvb-maintainer] Options depending on STANDALONE

On Thu, 3 Aug 2006, Adrian Bunk wrote:
> On Thu, Aug 03, 2006 at 03:56:17PM -0400, Dave Jones wrote:
> > You're describing PREVENT_FIRMWARE_BUILD. The text Zach quoted is from
> > STANDALONE, which is something else completely. That allows us to not
> > build drivers that pull in things from /etc and the like during compile.
> > (Whoever thought that was a good idea?)
>
>
> Is DVB_AV7110_FIRMWARE really still required?
> ALL other drivers work without such an option.

The other DVB drivers that need firmware load it when the device is opened
or used (ie. a channel is tuned). At least for the ones I'm familiar
with. If they are compiled directly into the kernel, they can still use
FW_LOADER since the loading won't happen until utill well after booting is
done.

For AV7110, it looks like the firmware loading is done when the driver is
first initialized. If AV7110 is compiled into the kernel, FW_LOADER can
not be used. The filesystem with the firmware won't be mounted yet.

So AV7110 has an option to compile a firmware file into the driver.

2006-08-03 23:40:55

by Zachary Amsden

[permalink] [raw]
Subject: Re: A proposal - binary

Alan Cox wrote:
> Ar Iau, 2006-08-03 am 15:31 -0700, ysgrifennodd Zachary Amsden:
>
>> Alan Cox wrote:
>>
>>> Could have fooled me. It seems to work for the IBM Mainframe people
>>> really well.
>>>
>> Yes, but not because of source compatibility. It works because the
>> hypervisor layer is actually architected in the hardware.
>>
>
> The hardware has nothing to do with it. It works because the hypervisor
> API has a spec and is maintained compatibly. Its not entirely hardware
> architected either, it has chunks of interfaces that are not present
> hardware level or not meaningful at that level - the paging assists for
> example are purely a hypervisor interface as are hipersockets.
>

Yes, it's part hardware, part software, a pseudo firmwarish mesh of the
two. When you've got multiple vendors involved, you need to paint the
interface with a broader stroke so that the individual details of each
don't get so involved, and you can do that in many different ways. You
still need an ABI, not just an API, though, for future compatibility of
existing kernels, which is important to mainframe customers for
migration as well as virtual machine users.

But I think we're running off into the weeds picking nits here..

Zach

2006-08-04 02:52:38

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: A proposal - binary

Greg KH wrote:
> On Thu, Aug 03, 2006 at 12:26:16PM -0700, Zachary Amsden wrote:
>
>> Who said that? Please smack them on the head with a broom. We are all
>> actively working on implementing Rusty's paravirt-ops proposal. It
>> makes the API vs ABI discussion moot, as it allow for both.
>>
>
> So everyone is still skirting the issue, oh great :)
>

I don't really think there's an issue to be skirted here. The current
plan is to design and implement a paravirt_ops interface, which is a
typical Linux source-level interface between the bulk of the kernel and
a set of hypervisor-specific backends. Xen, VMWare and other interested
parties are working together on this interface to make sure it meets
everyone's needs (and if you have another hypervisor you'd like to
support with this interface, we want to hear from you).

Until VMWare proposed VMI, Xen was the only hypervisor needing support,
so it was reasonable that the Xen patches just go straight to Xen. But
with paravirtops the result will be more flexible, since a kernel will
be configurable to run on any combination of supported hypervisor or on
bare hardware.

As far as I'm concerned, the issue of whether VMI has a stable ABI or
not is one which on the VMI side of the paravirtops interface, and it
doesn't have any wider implications.

Certainly Xen will maintain a backwards compatible hypervisor interface
for as long as we want/need to, but that's a matter for our side of
paravirtops. And the paravirtops interface will change over time as the
kernel does, and the backends will be adapted to match, either using the
same ABI to the underlying hypervisor, or an expanded one, or whatever;
it doesn't matter as far as the rest of the kernel is concerned.

There's the other question of whether VMI is a suitable interface for
Xen, making the whole paravirt_ops exercise redundant. Zach and VMWare
are claiming to have a VMI binding to Xen which is full featured with
good performance. That's an interesting claim, and I don't doubt that
its somewhat true. However, they haven't released either code for this
interface or detailed performance results, so its hard to evaluate. And
with anything in this area, its always the details that matter: what
tests, on what hardware, at what scale? Does VMI really expose all of
Xen's features, or does it just use a bare-minimum subset to get things
going? And how does the interface fit with short and long term design
goals?

I don't think anybody is willing to answer these questions with any
confidence. VMWare's initial VMI proposal was very geared towards their
particular hypervisor architecture; it has been modified over time to be
a little closer to Xen's model, in order to efficiently support the Xen
binding. But Xen and ESX have very different designs and underlying
philosophies, so I wouldn't expect a single interface to fit comfortably
with either.

As far as LKML is concerned, the only interface which matters is the
Linux -> <something> interface, which is defined within the scope of the
Linux development process. That's what paravirt_ops is intended to be.

And being a Linux API, paravirt_ops can avoid duplicating other Linux
interfaces. For example, VMI, like the Xen hypervisor interface, need
various ways to deal with time. The rest of the kernel needn't know or
care about those interfaces, because the paravirt backend for each can
also register a clocksource, or use other kernel APIs to expose that
interface (some of which we'll probably develop/expand over time as
needed, but in the normal way kernel interfaces chance).

J

2006-08-04 04:19:13

by Andrew Morton

[permalink] [raw]
Subject: Re: A proposal - binary

On Thu, 03 Aug 2006 19:52:40 -0700
Jeremy Fitzhardinge <[email protected]> wrote:

> Greg KH wrote:
> > On Thu, Aug 03, 2006 at 12:26:16PM -0700, Zachary Amsden wrote:
> >
> >> Who said that? Please smack them on the head with a broom. We are all
> >> actively working on implementing Rusty's paravirt-ops proposal. It
> >> makes the API vs ABI discussion moot, as it allow for both.
> >>
> >
> > So everyone is still skirting the issue, oh great :)
> >

A reasonable summary. A few touchups:

> I don't really think there's an issue to be skirted here. The current
> plan is to design and implement a paravirt_ops interface, which is a
> typical Linux source-level interface between the bulk of the kernel and
> a set of hypervisor-specific backends. Xen, VMWare and other interested
> parties are working together on this interface to make sure it meets
> everyone's needs (and if you have another hypervisor you'd like to
> support with this interface, we want to hear from you).
>
> Until VMWare proposed VMI, Xen was the only hypervisor needing support,
> so it was reasonable that the Xen patches just go straight to Xen.

No, even if vmware wasn't on the scene, the proposal to make the
Linux->hypervisor interface be specific to one hypervisor implementation is
a concern. That would remain true if vmware were to suddenly vanish.
It is a major interface, and interfaces are a major issue.

> But
> with paravirtops the result will be more flexible, since a kernel will
> be configurable to run on any combination of supported hypervisor or on
> bare hardware.
>
> As far as I'm concerned, the issue of whether VMI has a stable ABI or
> not is one which on the VMI side of the paravirtops interface, and it
> doesn't have any wider implications.
>
> Certainly Xen will maintain a backwards compatible hypervisor interface
> for as long as we want/need to, but that's a matter for our side of
> paravirtops. And the paravirtops interface will change over time as the
> kernel does, and the backends will be adapted to match, either using the
> same ABI to the underlying hypervisor, or an expanded one, or whatever;
> it doesn't matter as far as the rest of the kernel is concerned.
>
> There's the other question of whether VMI is a suitable interface for
> Xen, making the whole paravirt_ops exercise redundant. Zach and VMWare
> are claiming to have a VMI binding to Xen which is full featured with
> good performance. That's an interesting claim, and I don't doubt that
> its somewhat true. However, they haven't released either code for this
> interface or detailed performance results, so its hard to evaluate.

That was a major goofup from a kernel-development-process POV. They're
working hard to get that code out to us.

> And
> with anything in this area, its always the details that matter: what
> tests, on what hardware, at what scale? Does VMI really expose all of
> Xen's features, or does it just use a bare-minimum subset to get things
> going? And how does the interface fit with short and long term design
> goals?

This is a key issue and to some extent all bets are off until that code
emerges. Because it could be that the VMI->Xen implementation works well,
and that any present shortcomings can be resolved with acceptable effort.

If that happens, it puts a cloud over paravirtops.

But we just don't know any of this until we can get that code into the
right people's hands.

> I don't think anybody is willing to answer these questions with any
> confidence. VMWare's initial VMI proposal was very geared towards their
> particular hypervisor architecture; it has been modified over time to be
> a little closer to Xen's model, in order to efficiently support the Xen
> binding. But Xen and ESX have very different designs and underlying
> philosophies, so I wouldn't expect a single interface to fit comfortably
> with either.

Maybe, maybe not. Until we have an implementation to poke at this is all
speculation. And it is most regrettable that we're being put in a position
where we are forced to speculate.

> As far as LKML is concerned, the only interface which matters is the
> Linux -> <something> interface, which is defined within the scope of the
> Linux development process. That's what paravirt_ops is intended to be.

I must confess that I still don't "get" paravirtops. AFACIT the VMI
proposal, if it works, will make that whole layer simply go away. Which
is attractive. If it works.

> And being a Linux API, paravirt_ops can avoid duplicating other Linux
> interfaces. For example, VMI, like the Xen hypervisor interface, need
> various ways to deal with time. The rest of the kernel needn't know or
> care about those interfaces, because the paravirt backend for each can
> also register a clocksource, or use other kernel APIs to expose that
> interface (some of which we'll probably develop/expand over time as
> needed, but in the normal way kernel interfaces chance).
>

2006-08-04 05:04:41

by Rusty Russell

[permalink] [raw]
Subject: Re: A proposal - binary

On Thu, 2006-08-03 at 21:18 -0700, Andrew Morton wrote:
> > As far as LKML is concerned, the only interface which matters is the
> > Linux -> <something> interface, which is defined within the scope of the
> > Linux development process. That's what paravirt_ops is intended to be.
>
> I must confess that I still don't "get" paravirtops. AFACIT the VMI
> proposal, if it works, will make that whole layer simply go away. Which
> is attractive. If it works.

Everywhere in the kernel where we have multiple implementations we want
to select at runtime, we use an ops struct. Why should the choice of
Xen/VMI/native/other be any different?

Yes, we could force native and Xen to work via VMI, but the result would
be less clear, less maintainable, and gratuitously different from
elsewhere in the kernel. And, of course, unlike paravirt_ops where we
can change and add ops at any time, we can't similarly change the VMI
interface because it's an ABI (that's the point: the hypervisor can
provide the implementation).

I hope that clarifies,
Rusty.
--
Help! Save Australia from the worst of the DMCA: http://linux.org.au/law

2006-08-04 05:38:44

by Chris Wright

[permalink] [raw]
Subject: Re: A proposal - binary

* Andrew Morton ([email protected]) wrote:
> I must confess that I still don't "get" paravirtops. AFACIT the VMI
> proposal, if it works, will make that whole layer simply go away. Which
> is attractive. If it works.

Paravirtops is simply a table of function which are populated by the
hypervisor specific code at start-of-day. Some care is taken to patch
up callsites which are performance sensitive. The main difference is
the API vs. ABI distinction. In paravirt ops case, the ABI is defined at
compile time from source. The VMI takes it one step further and fixes
the ABI. That last step is a big one.

There are two basic issues. 1) what is the interface between the kernel
and the glue to a hypervisor. 2) how does one call from the kernel into
the glue layer.

Getting bogged down in #2, the details of the calling convention, is a
distraction from the real issue, #1. We are trying to actually find an
API that is useful for multiple projects. Paravirt_ops gives the
flexibility to evolve the interface.

2006-08-04 05:54:21

by Andrew Morton

[permalink] [raw]
Subject: Re: A proposal - binary

On Fri, 04 Aug 2006 15:04:35 +1000
Rusty Russell <[email protected]> wrote:

> On Thu, 2006-08-03 at 21:18 -0700, Andrew Morton wrote:
> > > As far as LKML is concerned, the only interface which matters is the
> > > Linux -> <something> interface, which is defined within the scope of the
> > > Linux development process. That's what paravirt_ops is intended to be.
> >
> > I must confess that I still don't "get" paravirtops. AFACIT the VMI
> > proposal, if it works, will make that whole layer simply go away. Which
> > is attractive. If it works.
>
> Everywhere in the kernel where we have multiple implementations we want
> to select at runtime, we use an ops struct. Why should the choice of
> Xen/VMI/native/other be any different?

VMI is being proposed as an appropriate way to connect Linux to Xen. If
that is true then no other glue is needed.

The central point here is whether that is right.

> Yes, we could force native and Xen to work via VMI, but the result would
> be less clear, less maintainable, and gratuitously different from
> elsewhere in the kernel.

I suspect others would disagree with that. We're at the stage of needing
to see code to settle this.

> And, of course, unlike paravirt_ops where we
> can change and add ops at any time, we can't similarly change the VMI
> interface because it's an ABI (that's the point: the hypervisor can
> provide the implementation).

hm. Dunno. ABIs can be uprevved. Perhaps.

2006-08-04 06:24:54

by Jan Engelhardt

[permalink] [raw]
Subject: Re: A proposal - binary

>
>Also, again-and-again, I read about some "new" thing that
>requires hooks for some binary blob. They come in various
>disguises. Certainly kernel developers are smart enough
>to see the Trojan Horse just outside the gates. Give it up!

http://www.museumreplicas.com/imagelib/0100064_l_000.jpg

You know what happened to _that_ 'horse'... ;-)




Jan Engelhardt
--

2006-08-04 06:28:29

by Antonio Vargas

[permalink] [raw]
Subject: Re: A proposal - binary

On 8/4/06, Chris Wright <[email protected]> wrote:
> * Andrew Morton ([email protected]) wrote:
> > I must confess that I still don't "get" paravirtops. AFACIT the VMI
> > proposal, if it works, will make that whole layer simply go away. Which
> > is attractive. If it works.
>
> Paravirtops is simply a table of function which are populated by the
> hypervisor specific code at start-of-day. Some care is taken to patch
> up callsites which are performance sensitive. The main difference is
> the API vs. ABI distinction. In paravirt ops case, the ABI is defined at
> compile time from source. The VMI takes it one step further and fixes
> the ABI. That last step is a big one.
>
> There are two basic issues. 1) what is the interface between the kernel
> and the glue to a hypervisor. 2) how does one call from the kernel into
> the glue layer.
>
> Getting bogged down in #2, the details of the calling convention, is a
> distraction from the real issue, #1. We are trying to actually find an
> API that is useful for multiple projects. Paravirt_ops gives the
> flexibility to evolve the interface.

One feature I found missing at the paravirt patches is to allow the
user to forbid the use of paravirtualization of certain features (via
a bitmask on the kernel commandline for example) so that the execution
drops into the native hardware virtualization system. Such a feature
would provide a big upwards compatibility for the kernel<->hypervisor
system. The case for this would be needing to forcefully upgrade the
hypervisor due to security issues and finding out that the hypervisor
is incompatible at the paravirtualizatrion level, then the user would
be at least capable of continuing to run the old kernel with the new
hypervisor until the compatibility is reached again.

BTW, what is the recommended distro or kernel setup to help testing
the latest paravirt patches? I've got a spare machine (with no needed
data) at hand which could be put to good use.

--
Greetz, Antonio Vargas aka winden of network

2006-08-04 07:00:33

by Chris Wright

[permalink] [raw]
Subject: Re: A proposal - binary

* Antonio Vargas ([email protected]) wrote:
> One feature I found missing at the paravirt patches is to allow the
> user to forbid the use of paravirtualization of certain features (via
> a bitmask on the kernel commandline for example) so that the execution
> drops into the native hardware virtualization system. Such a feature

There is no native harware virtualization system in this picture. Maybe
I'm just misunderstanding you.

> would provide a big upwards compatibility for the kernel<->hypervisor
> system. The case for this would be needing to forcefully upgrade the
> hypervisor due to security issues and finding out that the hypervisor
> is incompatible at the paravirtualizatrion level, then the user would
> be at least capable of continuing to run the old kernel with the new
> hypervisor until the compatibility is reached again.

This seems a bit like a trumped up example, as randomly disabling a part
of the pv interface is likely to cause correctness issues, not just
performance degradation.

Hypervisor compatibility is a slightly separate issue here. There's two
interfaces. The linux paravirt interface is internal to the kernel.
The hypervisor interface is external to the kernel.

kernel <--pv interface--> paravirt glue layer <--hv interface--> hypervisor

So changes to the hypervisor must remain ABI compatible to continue
working with the same kernel. This is the same requirement the kernel
has with the syscall interface it provides to userspace.

> BTW, what is the recommended distro or kernel setup to help testing
> the latest paravirt patches? I've got a spare machine (with no needed
> data) at hand which could be put to good use.

Distro of choice. Current kernel with the pv patches[1], but be
forewarned, they are very early, and not fully booting.

thanks,
-chris

[1] mercurial patchqueue http://ozlabs.org/~rusty/paravirt/

2006-08-04 07:05:07

by Rusty Russell

[permalink] [raw]
Subject: Re: A proposal - binary

On Thu, 2006-08-03 at 22:53 -0700, Andrew Morton wrote:
> On Fri, 04 Aug 2006 15:04:35 +1000
> Rusty Russell <[email protected]> wrote:
>
> > On Thu, 2006-08-03 at 21:18 -0700, Andrew Morton wrote:
> > Everywhere in the kernel where we have multiple implementations we want
> > to select at runtime, we use an ops struct. Why should the choice of
> > Xen/VMI/native/other be any different?
>
> VMI is being proposed as an appropriate way to connect Linux to Xen. If
> that is true then no other glue is needed.

Sorry, this is wrong. VMI was proposed as the appropriate way to
connect Linux to Xen, *and* native, *and* VMWare's hypervisors (and
others). This way one Linux binary can boot on all three, using
different VMI blobs.

> > Yes, we could force native and Xen to work via VMI, but the result would
> > be less clear, less maintainable, and gratuitously different from
> > elsewhere in the kernel.
>
> I suspect others would disagree with that. We're at the stage of needing
> to see code to settle this.

Wrong again. We've *seen* the code for VMI, and fairly hairy. Seeing
the native-implementation and Xen-implementation VMI blobs will not make
it less hairy!

> > And, of course, unlike paravirt_ops where we
> > can change and add ops at any time, we can't similarly change the VMI
> > interface because it's an ABI (that's the point: the hypervisor can
> > provide the implementation).
>
> hm. Dunno. ABIs can be uprevved. Perhaps.

Certainly VMI can be. But I'd prefer to leave the excellent hackers at
VMWare with the task of maintaining their ABI, and let Linux hackers
(most of whom will run native) manipulate paravirt_ops freely.

We're not good at maintaining ABIs. We're going to be especially bad at
maintaining an ABI when the 99% of us running native will never notice
the breakage.

Hope that clarifies,
Rusty.
--
Help! Save Australia from the worst of the DMCA: http://linux.org.au/law

2006-08-04 07:19:47

by Antonio Vargas

[permalink] [raw]
Subject: Re: A proposal - binary

On 8/4/06, Chris Wright <[email protected]> wrote:
> * Antonio Vargas ([email protected]) wrote:
> > One feature I found missing at the paravirt patches is to allow the
> > user to forbid the use of paravirtualization of certain features (via
> > a bitmask on the kernel commandline for example) so that the execution
> > drops into the native hardware virtualization system. Such a feature
>
> There is no native harware virtualization system in this picture. Maybe
> I'm just misunderstanding you.

What I was refering with "native hardware virtualization" is just the
VT or Pacitifica -provided trapping into the hypervisor upon executing
"dangerous" instructions such as tlb-flushes, reading/setting the
current ring-level, cli/sti...

> > would provide a big upwards compatibility for the kernel<->hypervisor
> > system. The case for this would be needing to forcefully upgrade the
> > hypervisor due to security issues and finding out that the hypervisor
> > is incompatible at the paravirtualizatrion level, then the user would
> > be at least capable of continuing to run the old kernel with the new
> > hypervisor until the compatibility is reached again.
>
> This seems a bit like a trumped up example, as randomly disabling a part
> of the pv interface is likely to cause correctness issues, not just
> performance degradation.

Yes, maybe just providing a switch to force paravirtops to use the
native hardware implementation would be enough, or just in case,
making the default the native hardware and allowing the kernel
commandline to select another one (just like on io-schedulers)

> Hypervisor compatibility is a slightly separate issue here. There's two
> interfaces. The linux paravirt interface is internal to the kernel.
> The hypervisor interface is external to the kernel.
>
> kernel <--pv interface--> paravirt glue layer <--hv interface--> hypervisor
>
> So changes to the hypervisor must remain ABI compatible to continue
> working with the same kernel. This is the same requirement the kernel
> has with the syscall interface it provides to userspace.

Yes. What I propose is allowing the systems to continue running (only
with degraded performance) when the hv-interface between the running
kernel and the running hypervisor doesn't match.

> > BTW, what is the recommended distro or kernel setup to help testing
> > the latest paravirt patches? I've got a spare machine (with no needed
> > data) at hand which could be put to good use.
>
> Distro of choice. Current kernel with the pv patches[1], but be
> forewarned, they are very early, and not fully booting.

Thanks, will be setting it up :)

--
Greetz, Antonio Vargas aka winden of network

2006-08-04 07:21:40

by Andrew Morton

[permalink] [raw]
Subject: Re: A proposal - binary

On Fri, 04 Aug 2006 17:04:59 +1000
Rusty Russell <[email protected]> wrote:

> On Thu, 2006-08-03 at 22:53 -0700, Andrew Morton wrote:
> > On Fri, 04 Aug 2006 15:04:35 +1000
> > Rusty Russell <[email protected]> wrote:
> >
> > > On Thu, 2006-08-03 at 21:18 -0700, Andrew Morton wrote:
> > > Everywhere in the kernel where we have multiple implementations we want
> > > to select at runtime, we use an ops struct. Why should the choice of
> > > Xen/VMI/native/other be any different?
> >
> > VMI is being proposed as an appropriate way to connect Linux to Xen. If
> > that is true then no other glue is needed.
>
> Sorry, this is wrong.

It's actually 100% correct.

> VMI was proposed as the appropriate way to
> connect Linux to Xen, *and* native, *and* VMWare's hypervisors (and
> others). This way one Linux binary can boot on all three, using
> different VMI blobs.

That also is correct.

> > > Yes, we could force native and Xen to work via VMI, but the result would
> > > be less clear, less maintainable, and gratuitously different from
> > > elsewhere in the kernel.
> >
> > I suspect others would disagree with that. We're at the stage of needing
> > to see code to settle this.
>
> Wrong again.

I was referring to the VMI-for-Xen code.

> We've *seen* the code for VMI, and fairly hairy.

I probably slept through that discussion - I don't recall that things were
that bad. Do you recall the Subject: or date?


2006-08-04 07:36:38

by Chris Wright

[permalink] [raw]
Subject: Re: A proposal - binary

* Antonio Vargas ([email protected]) wrote:
> What I was refering with "native hardware virtualization" is just the
> VT or Pacitifica -provided trapping into the hypervisor upon executing
> "dangerous" instructions such as tlb-flushes, reading/setting the
> current ring-level, cli/sti...

We are not talking about VMX or AMDV. Just plain ol' x86 hardware.

> Yes, maybe just providing a switch to force paravirtops to use the
> native hardware implementation would be enough, or just in case,
> making the default the native hardware and allowing the kernel
> commandline to select another one (just like on io-schedulers)

In this case native hardware == running on bare metal w/out VMX/AMDV
support and w/out any hypervisor. So, while this would let you actually
boot the machine, it's probably not really useful for the case you cited
(security update to hypervisor causes ABI breakage) because you'd be
booting a normal kernel w/out any virtualization. IOW, all the virtual
machines that were running on that physical machine would not be running.

> Yes. What I propose is allowing the systems to continue running (only
> with degraded performance) when the hv-interface between the running
> kernel and the running hypervisor doesn't match.

This is non-trivial. If the hv-interface breaks the ABI, then you'd
need to update the pv-glue layer in the kernel.

> >> BTW, what is the recommended distro or kernel setup to help testing
> >> the latest paravirt patches? I've got a spare machine (with no needed
> >> data) at hand which could be put to good use.
> >
> >Distro of choice. Current kernel with the pv patches[1], but be
> >forewarned, they are very early, and not fully booting.
>
> Thanks, will be setting it up :)

Thanks for helping.
-chris

2006-08-04 08:29:19

by Rusty Russell

[permalink] [raw]
Subject: Re: A proposal - binary

On Fri, 2006-08-04 at 00:21 -0700, Andrew Morton wrote:
> On Fri, 04 Aug 2006 17:04:59 +1000
> Rusty Russell <[email protected]> wrote:
>
> > On Thu, 2006-08-03 at 22:53 -0700, Andrew Morton wrote:
> > > VMI is being proposed as an appropriate way to connect Linux to Xen. If
> > > that is true then no other glue is needed.
> >
> > Sorry, this is wrong.
>
> It's actually 100% correct.

Err, yes. I actually misrepresented VMI: the native implementation is
inline (ie. no blob is required for native). Bad Rusty.

> > > > Yes, we could force native and Xen to work via VMI, but the result would
> > > > be less clear, less maintainable, and gratuitously different from
> > > > elsewhere in the kernel.
> > >
> > > I suspect others would disagree with that. We're at the stage of needing
> > > to see code to settle this.
> >
> > Wrong again.
>
> I was referring to the VMI-for-Xen code.

I know. And I repeat, we don't have to see that part, to know that the
result is less clear, less maintainable and gratuitously different from
elsewhere in the kernel than the paravirt_ops approach. We've seen
paravirt and the VMI parts of this already.

> > We've *seen* the code for VMI, and fairly hairy.
>
> I probably slept through that discussion - I don't recall that things were
> that bad. Do you recall the Subject: or date?

Read the patches which Zach sent back in March, particularly:

[RFC, PATCH 3/24] i386 Vmi interface definition
[RFC, PATCH 4/24] i386 Vmi inline implementation
[RFC, PATCH 5/24] i386 Vmi code patching

If you want to hack on x86 arch code, you'd need to understand these.

Then to see the paravirt patches go to http://ozlabs.org/~rusty/paravirt
and look at the approximately-equivalent paravirt_ops patches:

008-paravirt-structure.patch
009-binary-patch.patch

There's nothing in those paravirt_ops patches which will surprise any
kernel hacker. That's my entire point: maintainable, unsurprising,
clear.

Rusty.
--
Help! Save Australia from the worst of the DMCA: http://linux.org.au/law

2006-08-04 08:56:31

by Christoph Hellwig

[permalink] [raw]
Subject: Re: A proposal - binary

On Thu, Aug 03, 2006 at 02:41:27PM -0700, Zachary Amsden wrote:
> We have a working implementation of an ABI that interfaces to both ESX
> and Xen.

Until you stop violating our copyrights with the VMWare ESX support nothing
is going to be supported. So could you please stop abusing the Linux code
illegally in your project so I don't have to sue you, or at least piss off
and don't expect us to support you in violating our copyrights. I know this
isn't you fault, but please get the VMware/EMC legal department to fix it
up first.

2006-08-04 10:02:32

by Alan

[permalink] [raw]
Subject: Re: A proposal - binary

Ar Iau, 2006-08-03 am 16:40 -0700, ysgrifennodd Zachary Amsden:
> But I think we're running off into the weeds picking nits here..

Your entire proposal started in the weeds anyway.

Every other hypervisor system supported by Linux has a source code
interface on the Linux side that works reliably and compatibly through
versions and releases. The terrible things you claim will happen have
not. IBM have been doing virtualisation for something like forty years.
IBM is not using magic binary blobs. this also leads me to the
conclusion that you are wrong.

Alan

2006-08-04 14:34:51

by Theodore Ts'o

[permalink] [raw]
Subject: Re: A proposal - binary

On Fri, Aug 04, 2006 at 11:21:25AM +0100, Alan Cox wrote:
> Every other hypervisor system supported by Linux has a source code
> interface on the Linux side that works reliably and compatibly through
> versions and releases. The terrible things you claim will happen have
> not. IBM have been doing virtualisation for something like forty years.
> IBM is not using magic binary blobs. this also leads me to the
> conclusion that you are wrong.

Well, let's be a little fair here. Part of the problem, which is not
VMWare's fault, is that Intel a long time ago screwed up the x86 ISA
to make it non-virtualizable without all sorts of nasty hacks. (Some
say that this was done deliberately so Intel could sell more chips,
but I haven't personally seen any proof of this; it's not the point
either way, however.)

IBM's virtualization *does* have magic blobs; it's called the
hypervisor. The difference is that the PowerPC have a delibierately
castrated architecture such that when you are running a guest
operating system in an LPAR, so that when you do things like mess with
page tables (for example), it traps to the hypervisor which is really
"a magic binary blob" running on the bare Power architecture. The
difference is that the way you trap into the hypervisor is via a
PowerPC instructure that looks like a native instruction call.

The bottom line is that the line between magic binary blobs and
whether or not they are legal or not is more of a grey line than we
might want to admit.

For example, what if Transmeta was still around, and released a
digitally signed "magic binary blob" which provided VT/Pacific
capabilities to a Transmeta processor? (And if --- hypothetically ---
a version of Linux that required VT/Pacfica under the was released
under the GPLv3, would the RSA private key used to sign Transmeta's
"magic binary blob" be considered "corresponding source" and the GPLv3
used as a way to beat Transmeta to produce the private keys used to
sign their firmware update; it's after all a "necessary authentication
and encryption key" needed to install this hypothetical version of
Linux. :-)

As another example, the Alpha architecture has specified magic binary
blobs --- called PALcode --- where different binary PALcodes can be
needed for different OS's, and implement various privileged
instructures which are specifically intended for OS-level
functionality.

The real problem, though, is demonstrated by yet another "magic binary
blob" which we in fact already deal with, and that's ACPI. The
problem with "magic binary blobs" is that it's incredibly easy to do
an disastrously bad job with defining the interfaces, providing,
requiring, and performing conformance tests for the binary blobs, and
the on-going, continuing nightmare caused by different ACPI binary
blob providers doing stupid things that are only tested on Windows.

So I don't think the argument is necessarily that magic binary blobs
are illegal from the GPL perspective, but rather that magic binary
blobs are very hard to get right. (For example, I still remember
really bad experiences with different firmware versions for Cisco's
aironet wireless hardware being needed depending on which OS and which
version of the driver you had on your OS. Troubleshooting *that* was
a nightmare.) And that I think is the biggest problem with the VMI
proposal; which is a lack of trust that the various VM providers will
actually do something sane, both from an interface design and provided
implementation perspective. This is why I think everyone keeps
harping on the question of debuggability.

No one has really complained about the dubbugability, or lack thereof,
of the Power hypervisor, but OTOH IBM does spend vast amounts of $$$
making sure that it is stable and the interfaces are well-documented
and locked down.

- Ted

2006-08-04 17:00:35

by David Lang

[permalink] [raw]
Subject: Re: A proposal - binary

On Fri, 4 Aug 2006, Rusty Russell wrote:

> On Thu, 2006-08-03 at 22:53 -0700, Andrew Morton wrote:
>> On Fri, 04 Aug 2006 15:04:35 +1000
>> Rusty Russell <[email protected]> wrote:
>>
>>> On Thu, 2006-08-03 at 21:18 -0700, Andrew Morton wrote:
>>> Everywhere in the kernel where we have multiple implementations we want
>>> to select at runtime, we use an ops struct. Why should the choice of
>>> Xen/VMI/native/other be any different?
>>
>> VMI is being proposed as an appropriate way to connect Linux to Xen. If
>> that is true then no other glue is needed.
>
> Sorry, this is wrong. VMI was proposed as the appropriate way to
> connect Linux to Xen, *and* native, *and* VMWare's hypervisors (and
> others). This way one Linux binary can boot on all three, using
> different VMI blobs.
>
>>> Yes, we could force native and Xen to work via VMI, but the result would
>>> be less clear, less maintainable, and gratuitously different from
>>> elsewhere in the kernel.
>>
>> I suspect others would disagree with that. We're at the stage of needing
>> to see code to settle this.
>
> Wrong again. We've *seen* the code for VMI, and fairly hairy. Seeing
> the native-implementation and Xen-implementation VMI blobs will not make
> it less hairy!
>
>>> And, of course, unlike paravirt_ops where we
>>> can change and add ops at any time, we can't similarly change the VMI
>>> interface because it's an ABI (that's the point: the hypervisor can
>>> provide the implementation).
>>
>> hm. Dunno. ABIs can be uprevved. Perhaps.
>
> Certainly VMI can be. But I'd prefer to leave the excellent hackers at
> VMWare with the task of maintaining their ABI, and let Linux hackers
> (most of whom will run native) manipulate paravirt_ops freely.
>
> We're not good at maintaining ABIs. We're going to be especially bad at
> maintaining an ABI when the 99% of us running native will never notice
> the breakage.

some questions from a user. pleas point out where I am misunderstanding things.

one of the big uses of virtualization will be to run things in sandboxes, when
people do this they typicaly migrate the sandbox from system to system over time
(working with chroot sandboxes I've seen some HUGE skews between what's running
in the sandbox and what's running in the host). If the interface between the
guest kernel and the hypervisor isn't fixed how could somone run a 2.6.19 guest
and a 2.6.30 guest at the same time?

if it's only a source-level API this implies that when you move your host kernel
from 2.6.19 to 2.6.25 you would need to recompile your 2.6.19 guest kernel to
support the modifications. where are the patches going to come from to do this?

It seems to me from reading this thread that the PowerPC and S390 have a ABI
defined, specificly defined by the hardware in the case of PowerPC and by the
externaly maintained, Linux-independant hypervisor (which is effectivly the
hardware) in the case of the s390.

If there's going to be long-term compatability between different hosts and
guests there need some limits to what can change.

needing to uprev the host when you uprev a guest is acceptable

needing to uprev a guest when you uprev a host is not.

this basicly boils down to 'once you expose an interface to a user it can't
change', with the interface that's being exposed being the calls that the guest
makes to the host.

David Lang

2006-08-04 18:33:44

by Chris Wright

[permalink] [raw]
Subject: Re: A proposal - binary

* Greg KH ([email protected]) wrote:
> > Who said that? Please smack them on the head with a broom. We are all
> > actively working on implementing Rusty's paravirt-ops proposal. It
> > makes the API vs ABI discussion moot, as it allow for both.
>
> So everyone is still skirting the issue, oh great :)

No, we are working closely together on Rusty's paravirt ops proposal.
Given the number of questions I've fielded in the last 24 hrs, I really
don't think people understand this.

We are actively developing paravirt ops, we have a patch series that
begins to implement it (although it's still in it's nascent stage). If
anybody is interested in our work it is done in public. The working
tree is here: http://ozlabs.org/~rusty/paravirt/ (mercurial patchqueue,
just be forewarned that it's still quite early to be playing with it,
doesn't do much yet). We are using the virtualization mailing list for
discussions https://lists.osdl.org/mailman/listinfo/virtualization if
you are interested.

Zach (please correct me if I'm wrong here), is working on plugging the
VMI into the paravirt_ops interface. So his discussion of binary
interface issues is as a consumer of the paravirt_ops interface.

So, in case it's not clear, we are all working together to get
paravirt_ops upstream. My personal intention is to do everything I can
to help get things in shape to queue for 2.6.19 inclusion, and having
confusion over our direction does not help with that agressive timeline.

thanks,
-chris

2006-08-04 18:38:18

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: A proposal - binary

David Lang wrote:
> if it's only a source-level API this implies that when you move your
> host kernel from 2.6.19 to 2.6.25 you would need to recompile your
> 2.6.19 guest kernel to support the modifications. where are the
> patches going to come from to do this?

No, the low-level interface between the kernel is an ABI, which will be
as stable as your hypervisor author/vendor wants it to be (which is
generally "very stable"). The question is whether that low-level
interface is exposed to the rest of the kernel directly, or hidden
behind a kernel-internal source-level API.

J

2006-08-04 18:46:08

by Antonio Vargas

[permalink] [raw]
Subject: Re: A proposal - binary

On 8/4/06, David Lang <[email protected]> wrote:
> On Fri, 4 Aug 2006, Rusty Russell wrote:
>
> > On Thu, 2006-08-03 at 22:53 -0700, Andrew Morton wrote:
> >> On Fri, 04 Aug 2006 15:04:35 +1000
> >> Rusty Russell <[email protected]> wrote:
> >>
> >>> On Thu, 2006-08-03 at 21:18 -0700, Andrew Morton wrote:
> >>> Everywhere in the kernel where we have multiple implementations we want
> >>> to select at runtime, we use an ops struct. Why should the choice of
> >>> Xen/VMI/native/other be any different?
> >>
> >> VMI is being proposed as an appropriate way to connect Linux to Xen. If
> >> that is true then no other glue is needed.
> >
> > Sorry, this is wrong. VMI was proposed as the appropriate way to
> > connect Linux to Xen, *and* native, *and* VMWare's hypervisors (and
> > others). This way one Linux binary can boot on all three, using
> > different VMI blobs.
> >
> >>> Yes, we could force native and Xen to work via VMI, but the result would
> >>> be less clear, less maintainable, and gratuitously different from
> >>> elsewhere in the kernel.
> >>
> >> I suspect others would disagree with that. We're at the stage of needing
> >> to see code to settle this.
> >
> > Wrong again. We've *seen* the code for VMI, and fairly hairy. Seeing
> > the native-implementation and Xen-implementation VMI blobs will not make
> > it less hairy!
> >
> >>> And, of course, unlike paravirt_ops where we
> >>> can change and add ops at any time, we can't similarly change the VMI
> >>> interface because it's an ABI (that's the point: the hypervisor can
> >>> provide the implementation).
> >>
> >> hm. Dunno. ABIs can be uprevved. Perhaps.
> >
> > Certainly VMI can be. But I'd prefer to leave the excellent hackers at
> > VMWare with the task of maintaining their ABI, and let Linux hackers
> > (most of whom will run native) manipulate paravirt_ops freely.
> >
> > We're not good at maintaining ABIs. We're going to be especially bad at
> > maintaining an ABI when the 99% of us running native will never notice
> > the breakage.
>
> some questions from a user. pleas point out where I am misunderstanding things.

asking is the smart way :)

> one of the big uses of virtualization will be to run things in sandboxes, when
> people do this they typicaly migrate the sandbox from system to system over time
> (working with chroot sandboxes I've seen some HUGE skews between what's running
> in the sandbox and what's running in the host). If the interface between the
> guest kernel and the hypervisor isn't fixed how could somone run a 2.6.19 guest
> and a 2.6.30 guest at the same time?
>
> if it's only a source-level API this implies that when you move your host kernel
> from 2.6.19 to 2.6.25 you would need to recompile your 2.6.19 guest kernel to
> support the modifications. where are the patches going to come from to do this?
>
> It seems to me from reading this thread that the PowerPC and S390 have a ABI
> defined, specificly defined by the hardware in the case of PowerPC and by the
> externaly maintained, Linux-independant hypervisor (which is effectivly the
> hardware) in the case of the s390.

the trick with ppc, s390, m68k... is that they were defined since day
zero (*simplifies 68000/68010 history here*) so that when you run as
non-priviledged-task and try to execute a priviledged instruction,
then the security acts out and the OS gets control. x86 wasn't since
they had some instructions where the non-priviledged could detect it
was so, thus barring any way of the hypervisor appearing invisible.
this is solved on x86 and x64_64 with the new extensions.

> If there's going to be long-term compatability between different hosts and
> guests there need some limits to what can change.
>
> needing to uprev the host when you uprev a guest is acceptable
>
> needing to uprev a guest when you uprev a host is not.

Now, allowing this transparent acting is great since you can run your
normal kernel as-is as a guest. But to get close to 100% speed, what
you do is to rewrite parts of the OS to be aware of the hypervisor,
and stablish a common way to talk.

Thus happens the work with the paravirt-ops. Just like you can use any
filesystem under linux because they have a well-defined intrface to
the rest of the kernel, the paravirt-ops are the way we are wrking to
define an interface so that the rest of the kernel can be ignorant to
whether it's running on the bare metal or as a guest.

Then, if you needed to run say 2.6.19 with hypervisor A-1.0, you just
need to write paravirt-ops which talk and translate between 2.6.19 and
A-1.0. If 5 years later you are still running A-1.0 and want to run a
2.6.28 guest, then you would just need to write the paravirt-ops
between 2.6.28 and A-1.0, with no need to modify the rest of the code
or the hypervisor.

At the moment we only have 1 GPL hypervisor and 1 binary one. Then
maybe it's needed to define if linux should help run under binary
hypervisors, but imagine instead of this one, we had the usual Ghyper
vs Khyper separation. We would prefer to give the same adaptations to
both of them and abstract them away just like we do with filesystems.

> this basicly boils down to 'once you expose an interface to a user it can't
> change', with the interface that's being exposed being the calls that the guest
> makes to the host.

Yes, that's the reason some mentioned ppc, sparc, s390... because they
have been doing this longer than us and we could consider adopting
some of their designs (just like we did for POSIX system calls ;)

> David Lang

--
Greetz, Antonio Vargas aka winden of network

2006-08-04 19:09:54

by David Lang

[permalink] [raw]
Subject: Re: A proposal - binary

On Fri, 4 Aug 2006, Antonio Vargas wrote:

>> If there's going to be long-term compatability between different hosts and
>> guests there need some limits to what can change.
>>
>> needing to uprev the host when you uprev a guest is acceptable
>>
>> needing to uprev a guest when you uprev a host is not.
>
> Now, allowing this transparent acting is great since you can run your
> normal kernel as-is as a guest. But to get close to 100% speed, what
> you do is to rewrite parts of the OS to be aware of the hypervisor,
> and stablish a common way to talk.

I understand this, but for example a UML 2.6.10 kernel will continue to run
unmodified on top of a 2.6.17 kernel, the ABI used is stable. however if you
have a 2.6.10 host with a 2.6.10 UML guest and want to run a 2.6.17 guest you
may (but not nessasarily must) have to upgrade the host to 2.6.17 or later.

> Thus happens the work with the paravirt-ops. Just like you can use any
> filesystem under linux because they have a well-defined intrface to
> the rest of the kernel, the paravirt-ops are the way we are wrking to
> define an interface so that the rest of the kernel can be ignorant to
> whether it's running on the bare metal or as a guest.
>
> Then, if you needed to run say 2.6.19 with hypervisor A-1.0, you just
> need to write paravirt-ops which talk and translate between 2.6.19 and
> A-1.0. If 5 years later you are still running A-1.0 and want to run a
> 2.6.28 guest, then you would just need to write the paravirt-ops
> between 2.6.28 and A-1.0, with no need to modify the rest of the code
> or the hypervisor.

who is going to be writing all these interface layers to connect each kernel
version to each hypervisor version. and please note, I am not just considering
Xen and vmware as hypervisors, a vanilla linux kernel is the hypervisor for UML.
so just stating that the hypervisor maintainers need to do this is implying that
the kernel maintainers would be required to do this.

also I'm looking at the more likly case that 5 years from now you may still be
runnint 2.6.19, but need to upgrade to hypervisor A-5.8 (to support a different
client). you don't want to have to try and recompile the 2.6.19 kernel to keep
useing it.

> At the moment we only have 1 GPL hypervisor and 1 binary one. Then
> maybe it's needed to define if linux should help run under binary
> hypervisors, but imagine instead of this one, we had the usual Ghyper
> vs Khyper separation. We would prefer to give the same adaptations to
> both of them and abstract them away just like we do with filesystems.

you have three hypervisors that I know of. Linux, Xen (multiple versions) , and
VMware. each with (mostly) incompatable guests

>> this basicly boils down to 'once you expose an interface to a user it can't
>> change', with the interface that's being exposed being the calls that the
>> guest
>> makes to the host.
>
> Yes, that's the reason some mentioned ppc, sparc, s390... because they
> have been doing this longer than us and we could consider adopting
> some of their designs (just like we did for POSIX system calls ;)

I'm not commenting on any of the specifics of the interface calls (I trust you
guys to make that be sane :-) I'm just responding the the idea that the
interface actually needs to be locked down to an ABI as opposed to just
source-level compatability.

David Lang

2006-08-04 19:27:05

by Arjan van de Ven

[permalink] [raw]
Subject: Re: A proposal - binary

David Lang wrote:
> I'm not commenting on any of the specifics of the interface calls (I
> trust you guys to make that be sane :-) I'm just responding the the idea
> that the interface actually needs to be locked down to an ABI as opposed
> to just source-level compatability.

you are right that the interface to the HV should be stable. But those are going
to be specific to the HV, the paravirt_ops allows the kernel to smoothly deal
with having different HV's.
So in a way it's an API interface to allow the kernel to deal with multiple
different ABIs that exist today and will in the future.

2006-08-04 19:47:42

by Jeff Dike

[permalink] [raw]
Subject: Re: A proposal - binary

On Fri, Aug 04, 2006 at 12:06:28PM -0700, David Lang wrote:
> I understand this, but for example a UML 2.6.10 kernel will continue to run
> unmodified on top of a 2.6.17 kernel, the ABI used is stable. however if
> you have a 2.6.10 host with a 2.6.10 UML guest and want to run a 2.6.17
> guest you may (but not nessasarily must) have to upgrade the host to 2.6.17
> or later.

Why might you have to do that?

Jeff

2006-08-04 19:49:25

by David Lang

[permalink] [raw]
Subject: Re: A proposal - binary

On Fri, 4 Aug 2006, Arjan van de Ven wrote:

> David Lang wrote:
>> I'm not commenting on any of the specifics of the interface calls (I trust
>> you guys to make that be sane :-) I'm just responding the the idea that the
>> interface actually needs to be locked down to an ABI as opposed to just
>> source-level compatability.
>
> you are right that the interface to the HV should be stable. But those are
> going
> to be specific to the HV, the paravirt_ops allows the kernel to smoothly deal
> with having different HV's.
> So in a way it's an API interface to allow the kernel to deal with multiple
> different ABIs that exist today and will in the future.

so if I understand this correctly we are saying that a kernel compiled to run on
hypervisor A would need to be recompiled to run on hypervisor B, and recompiled
again to run on hypervisor C, etc

where A could be bare hardware, B could be Xen 2, C could be Xen 3, D could be
vmware, E could be vanilla Linux, etc.

this sounds like something that the distros would not support, they would pick
their one hypervisor to support and leave out the others. the big problem with
this is that the preferred hypervisor will change over time and people will be
left with incompatable choices (or having to compile their own kernels,
including having to recompile older kernels to support newer hypervisors)

David Lang


2006-08-04 19:52:48

by David Lang

[permalink] [raw]
Subject: Re: A proposal - binary

On Fri, 4 Aug 2006, Jeff Dike wrote:

> On Fri, Aug 04, 2006 at 12:06:28PM -0700, David Lang wrote:
>> I understand this, but for example a UML 2.6.10 kernel will continue to run
>> unmodified on top of a 2.6.17 kernel, the ABI used is stable. however if
>> you have a 2.6.10 host with a 2.6.10 UML guest and want to run a 2.6.17
>> guest you may (but not nessasarily must) have to upgrade the host to 2.6.17
>> or later.
>
> Why might you have to do that?

take this with a grain of salt, I'm not saying the particular versions I'm
listing would require this

if your new guest kernel wants to use some new feature (SKAS3, time
virtualization, etc) but the older host kernel didn't support some system call
nessasary to implement it, you may need to upgrade the host kernel to one that
provides the new features.

David Lang

2006-08-04 20:11:37

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: A proposal - binary

David Lang wrote:
> so if I understand this correctly we are saying that a kernel compiled
> to run on hypervisor A would need to be recompiled to run on
> hypervisor B, and recompiled again to run on hypervisor C, etc
>
> where A could be bare hardware, B could be Xen 2, C could be Xen 3, D
> could be vmware, E could be vanilla Linux, etc.

Yes, but you can compile one kernel for any set of hypervisors, so if
you want both Xen and VMI, then compile both in. (You always get bare
hardware support.)

> this sounds like something that the distros would not support, they
> would pick their one hypervisor to support and leave out the others.
> the big problem with this is that the preferred hypervisor will change
> over time and people will be left with incompatable choices (or having
> to compile their own kernels, including having to recompile older
> kernels to support newer hypervisors)

Why? That's like saying that distros will only bother to compile in one
scsi driver.

The hypervisor driver is tricker than a normal kernel device driver,
because in general it needs to be present from very early in boot, which
precludes it from being a normal module. There's hope that we'll be
able to support hypervisor drivers as boot-time grub/multiboot modules,
so you'll be able to compile up a new hypervisor driver for a particular
kernel and use it without recompiling the whole thing.


J

2006-08-04 20:39:06

by David Lang

[permalink] [raw]
Subject: Re: A proposal - binary

On Fri, 4 Aug 2006, Jeremy Fitzhardinge wrote:

>> so if I understand this correctly we are saying that a kernel compiled to
>> run on hypervisor A would need to be recompiled to run on hypervisor B, and
>> recompiled again to run on hypervisor C, etc
>>
>> where A could be bare hardware, B could be Xen 2, C could be Xen 3, D could
>> be vmware, E could be vanilla Linux, etc.
>
> Yes, but you can compile one kernel for any set of hypervisors, so if you
> want both Xen and VMI, then compile both in. (You always get bare hardware
> support.)

how can I compile in support for Xen4 on my 2.6.18 kernel? after all xen 2 and
xen3 are incompatable hypervisors so why wouldn't xen4 (and I realize there is
no xen4 yet, but there is likly to be one during the time virtual servers
created with 2.6.18 are still running)

>> this sounds like something that the distros would not support, they would
>> pick their one hypervisor to support and leave out the others. the big
>> problem with this is that the preferred hypervisor will change over time
>> and people will be left with incompatable choices (or having to compile
>> their own kernels, including having to recompile older kernels to support
>> newer hypervisors)
>
> Why? That's like saying that distros will only bother to compile in one scsi
> driver.
>
> The hypervisor driver is tricker than a normal kernel device driver, because
> in general it needs to be present from very early in boot, which precludes it
> from being a normal module. There's hope that we'll be able to support
> hypervisor drivers as boot-time grub/multiboot modules, so you'll be able to
> compile up a new hypervisor driver for a particular kernel and use it without
> recompiling the whole thing.

distros don't offer kernels with all options today, why would they in the future
(how many distros offer seperate 486/586/K6/K7/Pentium/P2/P3/P4 kernels, none.
they offer a least-common denominator kernel or two instead)

I also am missing something here. how can a system be compiled to do several
different things for the same privilaged opcode (including running that opcode)
without turning that area of code into a performance pig as it checks for each
possible hypervisor being present?

David Lang

2006-08-04 20:41:23

by Zachary Amsden

[permalink] [raw]
Subject: Re: A proposal - binary

Chris Wright wrote:
> * Greg KH ([email protected]) wrote:
>
>>> Who said that? Please smack them on the head with a broom. We are all
>>> actively working on implementing Rusty's paravirt-ops proposal. It
>>> makes the API vs ABI discussion moot, as it allow for both.
>>>
>> So everyone is still skirting the issue, oh great :)
>>
>
> No, we are working closely together on Rusty's paravirt ops proposal.
> Given the number of questions I've fielded in the last 24 hrs, I really
> don't think people understand this.
>
> We are actively developing paravirt ops, we have a patch series that
> begins to implement it (although it's still in it's nascent stage). If
> anybody is interested in our work it is done in public. The working
> tree is here: http://ozlabs.org/~rusty/paravirt/ (mercurial patchqueue,
> just be forewarned that it's still quite early to be playing with it,
> doesn't do much yet). We are using the virtualization mailing list for
> discussions https://lists.osdl.org/mailman/listinfo/virtualization if
> you are interested.
>
> Zach (please correct me if I'm wrong here), is working on plugging the
> VMI into the paravirt_ops interface. So his discussion of binary
> interface issues is as a consumer of the paravirt_ops interface.
>

To be completely clear, I am creating a set of paravirt_ops for ESX.
This set of paravirt ops will still go through a binary indirection
layer. Hence, it is important for me to educate everyone on that layer
and find out the opinions people have on what an acceptable license /
source policy is for that layer. We need the layer for exactly the same
reason the vsyscall page is important. We use it to indirect
hypervisor calls so that they can be future compatible, instead of
forcing a particular hypervisor interface. When running on Intel vs.
AMD hardware, that interface may be different. When running inside HVM
hardware, VT or Pacifica, that interface _will_ be different. We must
allow for the possibility of alternative implementations. This layer is
very much like a PAL code layer that allows system level instructions to
have alternative implementations, and also, most importantly, means we
are free to change the structural layout of information which is shared
between the hypervisor and the kernel. This shared information will
grow and need to change as it evolves over time. But we can't break
compatibility with precompiled Linux kernels. So the layer needs to be
there and needs to be separate from the kernel, and I need to do that in
such a way that doesn't violate the licensing model of Linux or any
other operating system, while making sure that also doesn't conflict
with our corporate licensing policies. This is not a trivial problem.

> So, in case it's not clear, we are all working together to get
> paravirt_ops upstream. My personal intention is to do everything I can
> to help get things in shape to queue for 2.6.19 inclusion, and having
> confusion over our direction does not help with that agressive timeline.
>
Paravirt_ops has long term benefits for the i386 (and x86_64)
architectures. This is independent in fact of whether Xen and VMware
want to use the same ABI to talk to the hypervisor or not. From my
point of view, it is a cleaner way to implement the kernel backend to
both VMI and Xen, since it removes the requirement that we create an
entirely new sub-architecture for each hypervisor. In the Xen case,
they may want to run a dom-0 hypervisor which is compiled for an actual
hardware sub-arch, such as Summit or ES7000. Using a sub-arch for the
hypervisor means you would need some kind of nested sub-architecture
support. This is ludicrous. Instead, what paravirt-ops promises long
term is a way to get rid of the sub-architecture layer altogether.
Sub-arches like Voyager and Visual workstation have some strange
initialization requirements, interrupt controllers, and SMP handling.
Exactly the kind of thing which paravirt_ops is being designed to
indirect for hypervisors. In the end, there is no reason it can't be
expanded to a more general purpose interface that removes the
requirement to build separate kernels and maintain separate
sub-architectures for each weird new tweak of i386. As i386 moves into
more embedded systems, I would expect to see these new sub-architectures
begin to grow like a rash. It's ugly, and hard to maintain. I've
broken SGI Visual workstation and Voyager support more than I'd care to
admit because it is really hard to compile and test all of these
different variations of i386. In the end, it will finally be possible
to compile and run a single i386 kernel binary that is actually capable
of running on the full set of supported hardware. This makes every
distro and maintainers life a lot simpler.

The same approach can be used on x86_64 for paravirtualization, but also
to abstract out vendor differences between platforms. Opteron and EMT64
hardware are quite different, and the plethora of non-standard
motherboards and uses have already intruded into the kernel. Having a
clean interface to encapsulate these changes is also desirable here, and
once we've nailed down a final approach to achieving this for i386, it
makes sense to do x86_64 as well.

I'm now talking lightyears into the future, but when the i386 and x86_64
trees merge together, this layer will be almost identical for the two,
allowing sharing of tricky pieces of code, like the APIC and IO-APIC,
NMI handling, system profiling, and power management. It the interface
evolves in a nicely packaged and compartmentalized way from that, then
perhaps someday it can grow to become a true cross-architecture way to
handle machine abstraction and virtualization. Then you can compile a
single kernel which gets assembled to code proto-fragments that are
dynamically linked together during the boot sequence, using a
cross-machine translation unit that allows a single kernel to run on
every current and future processor architecture that mimics some
combined set of machine characteristics (N-tiered cache coloring,
multiway hardware page tables, hypercubic interrupt routing, dynamically
morphed GPUs, quantum hypervisor isolation). Of course, it will still
require a PCI bus.

So absolutely we should go in that direction now, and I'm fully
committed to working on it. Which is why I wanted feedback on what we
have to do to make sure our ESX implementation is done in a way that is
acceptable to the community. I too would like to push for an interface
in 2.6.19, and we can't have confusion on this issue be a last minute
stopper.

Maybe someday Xen and VMware can share the same ABI interface and both
use a VMI like layer. But that really is a separate and completely
orthogonal question. Paravirt-ops makes any approach to integrating
hypervisor awareness into the kernel cleaner by providing an appropriate
abstract interface for it.

Zach

2006-08-04 20:52:01

by Chris Wright

[permalink] [raw]
Subject: Re: A proposal - binary

* Zachary Amsden ([email protected]) wrote:
> Maybe someday Xen and VMware can share the same ABI interface and both
> use a VMI like layer. But that really is a separate and completely
> orthogonal question. Paravirt-ops makes any approach to integrating
> hypervisor awareness into the kernel cleaner by providing an appropriate
> abstract interface for it.

Thanks a lot for clarifying, Zach ;-)
-chris

2006-08-04 21:08:09

by Alan

[permalink] [raw]
Subject: Re: A proposal - binary

Ar Gwe, 2006-08-04 am 13:41 -0700, ysgrifennodd Zachary Amsden:
> committed to working on it. Which is why I wanted feedback on what we
> have to do to make sure our ESX implementation is done in a way that is
> acceptable to the community. I too would like to push for an interface
> in 2.6.19, and we can't have confusion on this issue be a last minute
> stopper.

In part thats a legal question so only a lawyer can really tell you what
is and isn't the line for derivative works.

Philosophically I can see the argument that the moment you hit a
hypervisor trap its akin to running another app (and an app which
communicates via that interface with many othr apps) so your Linux
kernel side code would be GPL and whatever it fires up which handles the
trap come syscall probably isn't. But I'm not a lawyer and neither you
nor anyone else, nor a court reviewing a case should consider the
statement above a guideline of intent.

Alan

2006-08-04 21:26:18

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: A proposal - binary

David Lang wrote:
> how can I compile in support for Xen4 on my 2.6.18 kernel? after all
> xen 2 and xen3 are incompatable hypervisors so why wouldn't xen4 (and
> I realize there is no xen4 yet, but there is likly to be one during
> the time virtual servers created with 2.6.18 are still running)

Firstly, backwards compatibility is very important; I would guess that
if there were a Xen4 ABI, the hypervisor would still support Xen3 for
some time. Secondly, if someone goes to the effort of backporting a
Xen4 paravirtops driver for 2.6.18, then you could compile it in.

> I also am missing something here. how can a system be compiled to do
> several different things for the same privilaged opcode (including
> running that opcode) without turning that area of code into a
> performance pig as it checks for each possible hypervisor being present?

Conceptually, the paravirtops structure is a structure of pointers to
functions which get filled in at runtime to support whatever hypervisor
we're running over. But it also has the means to patch inline versions
of the appropriate code sequences for performance-critical operations.

J

2006-08-04 21:40:49

by Bill Rugolsky Jr.

[permalink] [raw]
Subject: Re: A proposal - binary

On Fri, Aug 04, 2006 at 02:26:20PM -0700, Jeremy Fitzhardinge wrote:
> >I also am missing something here. how can a system be compiled to do
> >several different things for the same privilaged opcode (including
> >running that opcode) without turning that area of code into a
> >performance pig as it checks for each possible hypervisor being present?
>
> Conceptually, the paravirtops structure is a structure of pointers to
> functions which get filled in at runtime to support whatever hypervisor
> we're running over. But it also has the means to patch inline versions
> of the appropriate code sequences for performance-critical operations.

Perhaps Ulrich and Jakub should join this discussion, as the whole
thing sounds like a rehash of the userland ld.so + glibc versioned ABI.
glibc has weathered 64-bit LFS changes to open(), SYSENTER, and vdso.

Isn't this discussion entirely analogous (except for the patching of
performance critical sections, perhaps) to taking a binary compiled
against glibc-2.0 back on Linux-2.2 and running it on glibc-2.4 + 2.6.17?
Or OpenSolaris, for that matter?

Bill Rugolsky

2006-08-04 21:47:47

by Jeff Dike

[permalink] [raw]
Subject: Re: A proposal - binary

On Fri, Aug 04, 2006 at 12:49:13PM -0700, David Lang wrote:
> >Why might you have to do that?
>
> take this with a grain of salt, I'm not saying the particular versions I'm
> listing would require this
>
> if your new guest kernel wants to use some new feature (SKAS3, time
> virtualization, etc) but the older host kernel didn't support some system
> call nessasary to implement it, you may need to upgrade the host kernel to
> one that provides the new features.

OK, yeah.

Just making sure you weren't thinking that the UML and host versions
were tied together (although a modern distro won't boot on a 2.6 UML
on a 2.4 host because UML's TLS needs TLS support on the host...).

Jeff

2006-08-04 22:02:30

by Andi Kleen

[permalink] [raw]
Subject: Re: A proposal - binary

> In the Xen case,
> they may want to run a dom-0 hypervisor which is compiled for an actual
> hardware sub-arch, such as Summit or ES7000.

There is no reason Summit or es7000 or any other subarchitecture
would need to do different virtualization. In fact these subarchitectures
are pretty much obsolete by the generic subarchitecture and could be fully
done by runtime switching.

I don't disagree with your general point that some kind of PAL
code between kernels and hypervisors might be a good idea
(in fact I think Xen already uses vsyscall pages in some cases for this),
but this particular example is no good.

> I would expect to see these new sub-architectures
> begin to grow like a rash.

I hope not. The i386 subarchitecture setup is pretty bad already
and mostly obsolete for modern systems.

> The same approach can be used on x86_64 for paravirtualization, but also
> to abstract out vendor differences between platforms. Opteron and EMT64
> hardware are quite different, and the plethora of non-standard
> motherboards and uses have already intruded into the kernel. Having a
> clean interface to encapsulate these changes is also desirable here, and
> once we've nailed down a final approach to achieving this for i386, it
> makes sense to do x86_64 as well.

Possible.

>
> I'm now talking lightyears into the future

tststs - please watch your units.

>, but when the i386 and x86_64
> trees merge together,

I don't think that will happen in the way you imagine. I certainly
don't plan to ever merge legacy stuff like Voyager or Visual Workstation
or even 586 multiprocessor support.

It might be that x86-64 grows 32bit support at some point, but certainly
only for modern systems and without the heavyweight subarchitecture setup
that i386 uses.

> this layer will be almost identical for the two,
> allowing sharing of tricky pieces of code, like the APIC and IO-APIC,

No, one of the strong points of the x86-64 port is that APIC/IO-APIC support
doesn't carry all the legacy i386 has to carry.

> NMI handling, system profiling, and power management. It the interface
> evolves in a nicely packaged and compartmentalized way from that, then
> perhaps someday it can grow to become a true cross-architecture way to
> handle machine abstraction and virtualization.

I don't fully agree to move everything into paravirt ops. IMHO
it should be only done for stuff which is performance critical
or cannot be virtualized.

For most other stuff a Hypervisor can always trap or not bother.

> (N-tiered cache coloring,
> multiway hardware page tables, hypercubic interrupt routing, dynamically
> morphed GPUs, quantum hypervisor isolation).

I have my doubts paravirt ops will ever support any of this @)
If we tried that then it would be so messy that it would turn into
a bad idea.

> Of course, it will still
> require a PCI bus.

And it's unlikely PCI will be ever a good fit for a Quantum computer @)

> I too would like to push for an interface
> in 2.6.19, and we can't have confusion on this issue be a last minute
> stopper.

For 2.6.19 it's too late already. Freeze for its merge
window has already nearly begun and this stuff is not ready yet.

> Maybe someday Xen and VMware can share the same ABI interface and both
> use a VMI like layer.

The problem with VMI is that while it allows hypervisor side evolution
it doesn't really allow Linux side evolution with its fixed spec.

But having it a bit isolated is probably ok.

-Andi

2006-08-04 22:13:12

by Arjan van de Ven

[permalink] [raw]
Subject: Re: A proposal - binary

David Lang wrote:
> On Fri, 4 Aug 2006, Arjan van de Ven wrote:
>
>> David Lang wrote:
>>> I'm not commenting on any of the specifics of the interface calls (I
>>> trust you guys to make that be sane :-) I'm just responding the the
>>> idea that the interface actually needs to be locked down to an ABI as
>>> opposed to just source-level compatability.
>>
>> you are right that the interface to the HV should be stable. But those
>> are going
>> to be specific to the HV, the paravirt_ops allows the kernel to
>> smoothly deal
>> with having different HV's.
>> So in a way it's an API interface to allow the kernel to deal with
>> multiple
>> different ABIs that exist today and will in the future.
>
> so if I understand this correctly we are saying that a kernel compiled
> to run on hypervisor A would need to be recompiled to run on hypervisor
> B, and recompiled again to run on hypervisor C, etc
>
no the actual implementation of the operation structure is dynamic and can be picked
at runtime, so you can compile a kernel for A,B *and* C and at runtime the kernel
picks the one you have

2006-08-04 22:39:31

by Zachary Amsden

[permalink] [raw]
Subject: Re: A proposal - binary

Andi Kleen wrote:
>> In the Xen case,
>> they may want to run a dom-0 hypervisor which is compiled for an actual
>> hardware sub-arch, such as Summit or ES7000.
>>
>
> There is no reason Summit or es7000 or any other subarchitecture
> would need to do different virtualization. In fact these subarchitectures
> are pretty much obsolete by the generic subarchitecture and could be fully
> done by runtime switching.
>

For privileged domains that have hardware privileges and need to send
IPIs or something it might make sense. Othewsie, there is no issue.

>> I would expect to see these new sub-architectures
>> begin to grow like a rash.
>>
>
> I hope not. The i386 subarchitecture setup is pretty bad already
> and mostly obsolete for modern systems.
>

Yes, I hope not too.

>
>> I'm now talking lightyears into the future
>>
>
> tststs - please watch your units.
>

I realized after I wrote it ;)

> I don't fully agree to move everything into paravirt ops. IMHO
> it should be only done for stuff which is performance critical
> or cannot be virtualized.

Yes, this is all just a crazy idea, not an actual proposal.

> And it's unlikely PCI will be ever a good fit for a Quantum computer @)
>

Hmm, a quantum bus would only allow one reader of each quantum bit. So
you couldn't broadcast without daisy chaining everything. Could be an
issue.

>> Maybe someday Xen and VMware can share the same ABI interface and both
>> use a VMI like layer.
>>
>
> The problem with VMI is that while it allows hypervisor side evolution
> it doesn't really allow Linux side evolution with its fixed spec.
>

It doesn't stop Linux from using the provided primitives in any way is
sees fit. So it doesn't top evolution in that sense. What it does stop
is having the Linux hypervisor interface grow antlers and have new
hooves grafted onto it. What it sorely needed in the interface is a way
to probe and detect optional features that allow it to grow independent
of one particular hypervisor vendor.

Zach

2006-08-04 22:43:40

by David Lang

[permalink] [raw]
Subject: Re: A proposal - binary

On Fri, 4 Aug 2006, Jeff Dike wrote:

> On Fri, Aug 04, 2006 at 12:49:13PM -0700, David Lang wrote:
>>> Why might you have to do that?
>>
>> take this with a grain of salt, I'm not saying the particular versions I'm
>> listing would require this
>>
>> if your new guest kernel wants to use some new feature (SKAS3, time
>> virtualization, etc) but the older host kernel didn't support some system
>> call nessasary to implement it, you may need to upgrade the host kernel to
>> one that provides the new features.
>
> OK, yeah.
>
> Just making sure you weren't thinking that the UML and host versions
> were tied together (although a modern distro won't boot on a 2.6 UML
> on a 2.4 host because UML's TLS needs TLS support on the host...).

this is exactly the type of thing that I think is acceptable.

this is a case of a new client needing a new host.

if you have a server running a bunch of 2.4 UMLs on a 2.4 host and want to add
a 2.6 UML you can do it becouse you can shift to a buch of 2.4 UMLs (plus one
2.6 UML) running on a 2.6 host.

what I would be bothered by was if you weren't able to run a 2.4 UML on a 2.6
host becouse you have locked out the upgrade path

Everyone needs to remember that this sort of thing does happen, Xen2 clients
cannot run on a Xen3 host.

David Lang

2006-08-04 22:47:17

by David Lang

[permalink] [raw]
Subject: Re: A proposal - binary

On Sat, 5 Aug 2006, Andi Kleen wrote:

> The problem with VMI is that while it allows hypervisor side evolution
> it doesn't really allow Linux side evolution with its fixed spec.
>
> But having it a bit isolated is probably ok.

actually, wouldn't something like this allow for a one-way evolution (the spec
can be changed, but the hypervisor side needs to support clients what only talk
older versions. i.e. the new spec is a superset of the old one (barring major
security-type problems that require exeptions to the rules))?

David Lang

2006-08-04 22:48:53

by David Lang

[permalink] [raw]
Subject: Re: A proposal - binary

On Fri, 4 Aug 2006, Arjan van de Ven wrote:

>>
>> so if I understand this correctly we are saying that a kernel compiled to
>> run on hypervisor A would need to be recompiled to run on hypervisor B, and
>> recompiled again to run on hypervisor C, etc
>>
> no the actual implementation of the operation structure is dynamic and can be
> picked
> at runtime, so you can compile a kernel for A,B *and* C and at runtime the
> kernel
> picks the one you have

Ok, I was under the impression that this sort of thing was frowned upon for
hotpath items (which I understand a good chunk of this would be).

this still leaves the question of old client on new hypervisors that is
continueing in other branches of this thread.

David Lang

2006-08-04 22:52:49

by Andi Kleen

[permalink] [raw]
Subject: Re: A proposal - binary


> For privileged domains that have hardware privileges and need to send
> IPIs or something it might make sense.

Any SMP guest needs IPI support of some sort.

But it is hopefully independent of subarchitectures in the paravirtualized
case.


> doesn't stop Linux from using the provided primitives in any way is
> sees fit. So it doesn't top evolution in that sense. What it does stop
> is having the Linux hypervisor interface grow antlers and have new
> hooves grafted onto it. What it sorely needed in the interface is a way
> to probe

That's the direction the interface is evolving I think (see multiple
entry point discussion)

> and detect optional features that allow it to grow independent
> of one particular hypervisor vendor.

Ok maybe not with options and subsets so far, but one has to
start somewhere.

-Andi

2006-08-05 00:06:40

by Paul Mackerras

[permalink] [raw]
Subject: Re: A proposal - binary

Theodore Tso writes:

> IBM's virtualization *does* have magic blobs; it's called the
> hypervisor. The difference is that the PowerPC have a delibierately
> castrated architecture such that when you are running a guest
> operating system in an LPAR, so that when you do things like mess with
> page tables (for example), it traps to the hypervisor which is really

Well no. When the kernel wants to change the hardware page tables it
doesn't even try to do it itself, it calls the hypervisor via the
"hypervisor system call" instruction. It's entirely analogous to a
program calling the "write" system call to send output to a terminal
rather than trying to drive the serial port directly (via outb
instructions or whatever).

> "a magic binary blob" running on the bare Power architecture. The

Not really. Not any more than the Windows kernel is a "magic binary
blob" in a GPL'd program running under Windows.

> difference is that the way you trap into the hypervisor is via a
> PowerPC instructure that looks like a native instruction call.

Wow, you must have been really tired when you wrote that... :)

> The bottom line is that the line between magic binary blobs and
> whether or not they are legal or not is more of a grey line than we
> might want to admit.

It's quite clear that (a) being in a separate address space (b) having
a defined, documented interface and (c) being used by multiple
different client OSes is pretty good evidence that something is an
independent work, not a derived work of the kernel, and therefore not
subject to the GPL.

Paul.

2006-08-05 04:16:09

by James Bottomley

[permalink] [raw]
Subject: Re: A proposal - binary

On Fri, 2006-08-04 at 22:26 +0100, Alan Cox wrote:
> In part thats a legal question so only a lawyer can really tell you
> what
> is and isn't the line for derivative works.

Actually, this isn't quite true. In any licensing agreement between two
parties, what each thinks is an important consideration in the
enforcement of the agreement. This is how we got binary modules in the
first place, and so it also follows that what kernel developers think
about this proposal is an important influence on the eventual legal
opinon.

My take is that the VMI proposal breaks down into two pieces:

1) A hypervisor ABI. This is easy: we maintain ABIs today between libc
and the kernel, so nothing about an ABI is inherantly GPL violating.

2) A gateway page or vDSO provided by the hypervisor to the kernel.
This is the problematic piece, because the vDSO is de-facto linked into
the kernel and as such becomes subject to the prevailing developer
interpretation as being a derivative work by being linked in. As Arjan
pointed out, this can be avoided as long as the gateway page itself is
GPL ... we could even create mechanisms like we use today for module
licensing by having a tag in the VMI describing the licensing of the
gateway page, so the kernel could be made only to load gateway pages
that promise they're available under the GPL.

I think that if we do this tagging to load the VMI vDSO interface, then
I'm happy that all of the legal niceties are safely taken care of.
(Although the onus is now back on VMware to establish if they can GPL
their VMI blob).

James


2006-08-05 04:18:47

by James Bottomley

[permalink] [raw]
Subject: Re: A proposal - binary

On Fri, 2006-08-04 at 13:41 -0700, Zachary Amsden wrote:
> Instead, what paravirt-ops promises long
> term is a way to get rid of the sub-architecture layer altogether.
> Sub-arches like Voyager and Visual workstation have some strange
> initialization requirements, interrupt controllers, and SMP handling.
> Exactly the kind of thing which paravirt_ops is being designed to
> indirect for hypervisors.

Well ... I agree that in principle it's possible to have a kernel that
would run on both voyager and a generic x86 system and, I'll admit, I
tried to go that route before creating the subarchitectures. However,
in practice, I think the cost really becomes too high ... for voyager,
it becomes necessary really to intercept almost the entirety of the the
SMP API. The purpose of the subarchitecture interface wasn't to
eventually have some API description that would allow voyager to
co-exist with more normal x86 systems. It was to allow voyager to make
use of generic x86 while being completely different at the x86 SMP
level. I really don't think there'll ever be another x86 machine that's
as different from the APIC approach as the voyager VIC/QIC is. thus, I
think the actual x86 interface is much better described by mach-generic,
which abstracts out the interfaces necessary to the more standard APIC
based SMP systems.

James


2006-08-05 04:33:38

by Zachary Amsden

[permalink] [raw]
Subject: Re: A proposal - binary

James Bottomley wrote:
> Well ... I agree that in principle it's possible to have a kernel that
> would run on both voyager and a generic x86 system and, I'll admit, I
> tried to go that route before creating the subarchitectures. However,
> in practice, I think the cost really becomes too high ... for voyager,
> it becomes necessary really to intercept almost the entirety of the the
> SMP API. The purpose of the subarchitecture interface wasn't to
> eventually have some API description that would allow voyager to
> co-exist with more normal x86 systems. It was to allow voyager to make
> use of generic x86 while being completely different at the x86 SMP
> level. I really don't think there'll ever be another x86 machine that's
> as different from the APIC approach as the voyager VIC/QIC is. thus, I
> think the actual x86 interface is much better described by mach-generic,
> which abstracts out the interfaces necessary to the more standard APIC
> based SMP systems.
>

This is quite true today. But it is entirely possible that support in
Linux for Xen may want to rip out the APIC / IO-APIC entirely, replace
that with event channels, and use different SMP shootdown mechanisms, as
well as having their own special NMI delivery hook. We're also going to
have to make certain parts of the interface extremely efficient, and
we've already got several schemes to remove the penalty of indirection
by being rid of indirect branches - which could be a more broadly used
technique if it proves unintrusive and reliable enough. In that case,
you could basically support Voyager without a subarch, plus or minus one
special hook or two ;)

Zach

2006-08-05 05:37:05

by Zachary Amsden

[permalink] [raw]
Subject: Re: A proposal - binary

James Bottomley wrote:
> My take is that the VMI proposal breaks down into two pieces:
>

This is a very accurate description of our interface.

> 1) A hypervisor ABI. This is easy: we maintain ABIs today between libc
> and the kernel, so nothing about an ABI is inherantly GPL violating.
>

This I think is an absolute must for any sane interpretation of the
GPL. Otherwise, running GPL apps on any proprietary operating system
falls into the same situation, and you wouldn't be able to run them
without violating the GPL. Nor would you be able to run non-GPL
applications on a GPL kernel. It's really a matter of whether you
interpret the intent of the GPL to prevent someone deriving work from
your open source software and distributing that in binary form without
providing the dervied work - or if you interpret the GPL as saying that
all code must be open sourced. The latter is a very extreme position,
and I don't believe it is even correct with the current wording of GPL
v2 (IANAL).

> 2) A gateway page or vDSO provided by the hypervisor to the kernel.
> This is the problematic piece, because the vDSO is de-facto linked into
> the kernel and as such becomes subject to the prevailing developer
> interpretation as being a derivative work by being linked in. As Arjan
> pointed out, this can be avoided as long as the gateway page itself is
> GPL ... we could even create mechanisms like we use today for module
> licensing by having a tag in the VMI describing the licensing of the
> gateway page, so the kernel could be made only to load gateway pages
> that promise they're available under the GPL.
>

Yes, this is what prompted my whole module rant. The interesting thing
is - Linux may link to the hypervisor vDSO. But it may not link back
into Linux. This is where the line becomes very gray, as Theodore
mentioned earlier. Is it a license violation for a GPL app to link
against a non-GPL library? Surely, the other way around is a problem,
unless the library has been made explicitly LGPL. But if GPL apps can
link to non-GPL libraries, what stops GPL kernels from linking to
non-GPL modules? This is where I think things become more interpretive
than well defined. And that is why it is important for us to get kernel
developers feedback on exactly what that definition is.

> I think that if we do this tagging to load the VMI vDSO interface, then
> I'm happy that all of the legal niceties are safely taken care of.
> (Although the onus is now back on VMware to establish if they can GPL
> their VMI blob).
>

Tagging is interesting. You can tag modules by license. I can't say
today what license we will be able to use for it - it could be
completely proprietary, some new license, BSD, or GPL. This is the
essence of my original rant - it would be nice to have a way to tag the
license in the "blob" so the kernel can choose the appropriate course of
action. In that case, the pressure is off me to specify what the
license is - it's there for everyone to see, and then it is just a
matter of coming to a consensus as to what an acceptable license is for
Linux to link to it. What license(s) we provide is really not up to me,
although I personally would very much like to see an open source license
that allows everyone to see the code, fix any problems they have with
it, and distribute those fixes (purely my own personal opinion, and in
no way a statement, promise, or supposition in any legal or corporate
sense for any past, present, or future work by VMware, EMC, or any other
entity, wholly or partially owned by said corporations, and in no way
should this be interpreted as constituting a legal opinion for the
purposes of advice or rendering of any court decision, now, in the
future, or in the past for legal arbiters with access to time travel
equipment). <Now I'm covered better than Alan>.

Binary blob has been a PR disaster. I don't know if I first said it
unprompted, or if Cristoph cleverly baited me into using the phrase ;)
But lets be clear on one thing - blob implies some kind of shapeless,
fat thing. The VMI fits in two pages of memory, and has a well defined
interface, which gives it shape. So I prefer binary redirection
interface, or vDSO, or anything without the disparaged word "blob" in it.

Zach

2006-08-05 10:43:01

by Adrian Bunk

[permalink] [raw]
Subject: Re: A proposal - binary

On Fri, Aug 04, 2006 at 10:37:01PM -0700, Zachary Amsden wrote:
> James Bottomley wrote:
>...
> >2) A gateway page or vDSO provided by the hypervisor to the kernel.
> >This is the problematic piece, because the vDSO is de-facto linked into
> >the kernel and as such becomes subject to the prevailing developer
> >interpretation as being a derivative work by being linked in. As Arjan
> >pointed out, this can be avoided as long as the gateway page itself is
> >GPL ... we could even create mechanisms like we use today for module
> >licensing by having a tag in the VMI describing the licensing of the
> >gateway page, so the kernel could be made only to load gateway pages
> >that promise they're available under the GPL.
>
> Yes, this is what prompted my whole module rant. The interesting thing
> is - Linux may link to the hypervisor vDSO. But it may not link back
> into Linux. This is where the line becomes very gray, as Theodore
> mentioned earlier. Is it a license violation for a GPL app to link
> against a non-GPL library? Surely, the other way around is a problem,

I don't see the grey area.

Assuming non-GPL and not GPL compatible (e.g. 3 clause BSD is non-GPL
but compatible):

Unless all people holding a copyright on the GPL app agreed that this
linking is OK, it is considered a licence violation.

That's why you often see licence statements like the following:

"In addition, as a special exception, the Free Software Foundation
gives permission to link the code of its release of Wget with the
OpenSSL project's "OpenSSL" library (or with modified versions of it
that use the same license as the "OpenSSL" library), and distribute
the linked executables. You must obey the GNU General Public License
in all respects for all of the code used other than "OpenSSL". If you
modify this file, you may extend this exception to your version of the
file, but you are not obligated to do so. If you do not wish to do
so, delete this exception statement from your version."

> unless the library has been made explicitly LGPL. But if GPL apps can
> link to non-GPL libraries, what stops GPL kernels from linking to
> non-GPL modules? This is where I think things become more interpretive
> than well defined. And that is why it is important for us to get kernel
> developers feedback on exactly what that definition is.
>...

Some kernel developers (and some lawyers) consider all kernel modules
with not GPL compatible licences illegal - similar to the case of
linking a GPL app with a non-GPL library.

Quoting Novell [1]:

"Most developers of the kernel community consider non-GPL kernel
modules to be infringing on their copyright. Novell does respect this
position, and will no longer distribute non-GPL kernel modules as part
of future products. Novell is working with vendors to find alternative
ways to provide the functionality that was previously only available
with non-GPL kernel modules."

And considering the number of people having a copyright on parts of the
kernel, there's noone except a court who can tell what is OK and what is
not (and even a court decision is not binding for courts in other
countries).

> Zach

cu
Adrian

[1] http://lists.opensuse.org/archive/opensuse-announce/2006-Feb/0004.html

--

Gentoo kernels are 42 times more popular than SUSE kernels among
KLive users (a service by SUSE contractor Andrea Arcangeli that
gathers data about kernels from many users worldwide).

There are three kinds of lies: Lies, Damn Lies, and Statistics.
Benjamin Disraeli

2006-08-05 10:47:40

by Adrian Bunk

[permalink] [raw]
Subject: Re: A proposal - binary

On Sat, Aug 05, 2006 at 12:01:52AM +0200, Andi Kleen wrote:
>
> There is no reason Summit or es7000 or any other subarchitecture
> would need to do different virtualization. In fact these subarchitectures
> are pretty much obsolete by the generic subarchitecture and could be fully
> done by runtime switching.
>...

Has anyone measured the performance impact of rutime CLOCK_TICK_RATE
switching (since this will no longer allow some compile time
optimizations in jiffies.h)?

> -Andi

cu
Adrian

--

Gentoo kernels are 42 times more popular than SUSE kernels among
KLive users (a service by SUSE contractor Andrea Arcangeli that
gathers data about kernels from many users worldwide).

There are three kinds of lies: Lies, Damn Lies, and Statistics.
Benjamin Disraeli

2006-08-05 10:51:27

by Adrian Bunk

[permalink] [raw]
Subject: Re: [v4l-dvb-maintainer] Options depending on STANDALONE

On Thu, Aug 03, 2006 at 04:40:25PM -0700, Trent Piepho wrote:
> On Thu, 3 Aug 2006, Adrian Bunk wrote:
> > On Thu, Aug 03, 2006 at 03:56:17PM -0400, Dave Jones wrote:
> > > You're describing PREVENT_FIRMWARE_BUILD. The text Zach quoted is from
> > > STANDALONE, which is something else completely. That allows us to not
> > > build drivers that pull in things from /etc and the like during compile.
> > > (Whoever thought that was a good idea?)
> >
> >
> > Is DVB_AV7110_FIRMWARE really still required?
> > ALL other drivers work without such an option.
>
> The other DVB drivers that need firmware load it when the device is opened
> or used (ie. a channel is tuned). At least for the ones I'm familiar
> with. If they are compiled directly into the kernel, they can still use
> FW_LOADER since the loading won't happen until utill well after booting is
> done.
>
> For AV7110, it looks like the firmware loading is done when the driver is
> first initialized. If AV7110 is compiled into the kernel, FW_LOADER can
> not be used. The filesystem with the firmware won't be mounted yet.
>
> So AV7110 has an option to compile a firmware file into the driver.

But is there a technical reason why this has to be done this way?

This is the onle (non-OSS) driver doing it this way, and Zach has a
point that this is legally questionable.

cu
Adrian

--

Gentoo kernels are 42 times more popular than SUSE kernels among
KLive users (a service by SUSE contractor Andrea Arcangeli that
gathers data about kernels from many users worldwide).

There are three kinds of lies: Lies, Damn Lies, and Statistics.
Benjamin Disraeli

2006-08-05 11:31:55

by Alan

[permalink] [raw]
Subject: Re: A proposal - binary

Ar Gwe, 2006-08-04 am 22:37 -0700, ysgrifennodd Zachary Amsden:
> mentioned earlier. Is it a license violation for a GPL app to link
> against a non-GPL library? Surely, the other way around is a problem,


Actually the FSF always anticipated that case because its the same as
the GPL app on non-free OS case and the GPL there says

"However, as a special exception, the source code distributed need not
include anything that is normally distributed (in either source or
binary form) with the major components (compiler, kernel, and so on) of
the operating system on which the executable runs, unless that component
itself accompanies the executable."

> interface, which gives it shape. So I prefer binary redirection
> interface, or vDSO, or anything without the disparaged word "blob" in it.

Well if you are going to provide the source then its not really a binary
interface, its a jump table.

2006-08-05 11:57:45

by Andi Kleen

[permalink] [raw]
Subject: Re: A proposal - binary


> Has anyone measured the performance impact of rutime CLOCK_TICK_RATE
> switching (since this will no longer allow some compile time
> optimizations in jiffies.h)?

SUSE shipped a kernel briefly that had runtime switchable jiffies
and there were some benchmarks done and they didn't show noticeable
slowdown.

But with hr timers it should be pretty much obsolete anyways.

-Andi

2006-08-06 11:19:37

by Oliver Endriss

[permalink] [raw]
Subject: Re: [v4l-dvb-maintainer] Options depending on STANDALONE

Adrian Bunk wrote:
> On Thu, Aug 03, 2006 at 04:40:25PM -0700, Trent Piepho wrote:
> > On Thu, 3 Aug 2006, Adrian Bunk wrote:
> > > On Thu, Aug 03, 2006 at 03:56:17PM -0400, Dave Jones wrote:
> > > > You're describing PREVENT_FIRMWARE_BUILD. The text Zach quoted is from
> > > > STANDALONE, which is something else completely. That allows us to not
> > > > build drivers that pull in things from /etc and the like during compile.
> > > > (Whoever thought that was a good idea?)
> > >
> > > Is DVB_AV7110_FIRMWARE really still required?
> > > ALL other drivers work without such an option.
> >
> > The other DVB drivers that need firmware load it when the device is opened
> > or used (ie. a channel is tuned). At least for the ones I'm familiar
> > with. If they are compiled directly into the kernel, they can still use
> > FW_LOADER since the loading won't happen until utill well after booting is
> > done.
> >
> > For AV7110, it looks like the firmware loading is done when the driver is
> > first initialized. If AV7110 is compiled into the kernel, FW_LOADER can
> > not be used. The filesystem with the firmware won't be mounted yet.
> >
> > So AV7110 has an option to compile a firmware file into the driver.
>
> But is there a technical reason why this has to be done this way?
>
> This is the onle (non-OSS) driver doing it this way, and Zach has a
> point that this is legally questionable.

This option _is_ useful because it allows allows a user to build an
av7110 driver without hotplug etc. I NAK any attempt to remove it.

Sorry, a kernel option cannot cause a legal issue. Only the user does.
For non-distribution kernels there is no difference whether firmware is
loaded at run-time or compiled-in.

Obviously, there might be a difference for distribution kernels if you
are not allowed to distribute the firmware (imho not a problem in this
case, but IANAL). Simple solution: Do not enable the option.

I have no problem if you want to remove STANDALONE: Simply remove the
dependency to STANDALONE, but keep DVB_AV7110_FIRMWARE with default 'n'.

CU
Oliver

--
--------------------------------------------------------
VDR Remote Plugin available at
http://www.escape-edv.de/endriss/vdr/
--------------------------------------------------------

2006-08-06 22:02:35

by Pavel Machek

[permalink] [raw]
Subject: Re: A proposal - binary

Hi!

> >You're making a very good argument as to why we should
> >probably
> >require that the code linking against such an
> >interface, if we
> >decide we want one, should be required to be open
> >source.
>
> Personally, I don't feel a strong requirement that it be
> open source, because I don't believe it violates the
> intent of the GPL license by crippling free distribution
> of the kernel, requiring some fee for use, or doing
> anything unethical. There have been charges that the
> VMI layer is deliberately designed as a GPL
> circumvention device, which I want to stamp out now
> before we try to get any code for integrating to it
> upstreamed.

Maybe it is not designed tobe evil, but...

> >>I think you will see why our VMI layer is quite
> >>similar to a
> >>traditional ROM, and very dissimilar to an evil
> >>GPL-circumvention
> >>device.
> >
> >>(?) There are only two reasonable objections I can see
> >>to open
> >>sourcing the binary layer.
> >
> >Since none of the vendors that might use such a
> >paravirtualized
> >ROM for Linux actually have one of these reasons for
> >keeping their
> >paravirtualized ROM blob closed source, I say we might
> >as well
> >require that it be open source.
>
> I think saying require at this point is a bit
> preliminary for us -- I'm trying to prove we're not
> being evil and subverting the GPL, but I'm also not
> guaranteeing yet that we can open-source the code under
> a specific license. Sorry about having to doublespeak

...it should be very easy to opensource simple 'something' layer. If
it is so complex it is 'hard' to opensource, it is missdesigned,
anyway... so fix the design.

My proposal would be: add open-source hypervisor interface, and keep
it updated for a while. If it is too hard to keep updated, we'll have
to solve it, somehow, but lets not overengineer it now.
Pavel
--
Thanks for all the (sleeping) penguins.

2006-08-06 22:45:31

by Zachary Amsden

[permalink] [raw]
Subject: Re: A proposal - binary

Pavel Machek wrote:
> ...it should be very easy to opensource simple 'something' layer. If
> it is so complex it is 'hard' to opensource, it is missdesigned,
> anyway... so fix the design.
>

It's not a design issue - it's a legal issue at this point, and one that
I'm not qualified to come up with a good answer for. The biggest
technical issue I think for open sourcing the VMI, is that it is not
part of the kernel, but stand alone firmware with a rather bizarre build
environment, so the code alone is not sufficient to allow it to be open
sourced, but this is not a hard problem to solve.

Zach

2006-08-06 22:59:47

by Greg KH

[permalink] [raw]
Subject: Re: A proposal - binary

On Sun, Aug 06, 2006 at 03:45:29PM -0700, Zachary Amsden wrote:
> Pavel Machek wrote:
> >...it should be very easy to opensource simple 'something' layer. If
> >it is so complex it is 'hard' to opensource, it is missdesigned,
> >anyway... so fix the design.
> >
>
> It's not a design issue - it's a legal issue at this point, and one that
> I'm not qualified to come up with a good answer for.

Then I suggest you press these issues with those within your company who
are qualified to answer it. Otherwise we will not get anywhere with
this line of discussion :(

As someone who has dealt with many corporate lawyers with topics like
this, I wish you lots of luck,

greg k-h

2006-08-07 17:29:55

by Thomas Renninger

[permalink] [raw]
Subject: RE: Options depending on STANDALONE

On Thu, 2006-08-03 at 16:49 -0400, Brown, Len wrote:
> >On Thu, Aug 03, 2006 at 10:25:43PM +0200, Adrian Bunk wrote:
> >> ACPI_CUSTOM_DSDT seems to be the most interesting case.
> >> It's anyway not usable for distribution kernels, and AFAIR the ACPI
> >> people prefer to get the kernel working with all original DSDTs
> >> (which usually work with at least one other OS) than letting
> >> the people workaround the problem by using a custom DSDT.
> >
> >Not true at all. For SuSE kernels, we have a patch that lets people
> >load a new DSDT from initramfs due to broken machines requiring a
> >replacement in order to work properly.
>
> CONFIG_ACPI_CUSTOM_DSDT allows hackers to debug their system
> by building a modified DSDT into the kernel to over-ride what
> came with the system. It would make no sense for a distro
> to use it, unless the distro were shipping only on 1 model machine.
> This technique is necessary for debugging, but makes no
> sense for production.
>
> The initramfs method shipped by SuSE is more flexible, allowing
> the hacker to stick the DSDT image in the initrd and use it
> without re-compiling the kernel.
>
> I have refused to accept the initrd patch into Linux many times,
> and always will.
>
> I've advised SuSE many times that they should not be shipping it,
> as it means that their supported OS is running on modified firmware --
> which, by definition, they can not support.
Tainting the kernel if done so should be sufficient.
> Indeed, one could view
> this method as couter-productive to the evolution of Linux --
> since it is our stated goal to run on the same machines that Windows
> runs on -- without requiring customers to modify those machines
> to run Linux.

There are three reasons for the initrd patch (last one also applies for
the compile in functionality):

1)
There might be "BIOS bugs" that will never get fixed:
https://bugzilla.novell.com/show_bug.cgi?id=160671
(Because it's an obvious BIOS bug, "compatibility" fixing it could make
things worse).

2)
There might be "ACPICA/kernel bugs" that take a while until they get
fixed:

This happens often. There comes out a new machine, using AML in a
slightly other way, we need to fix it in kernel/ACPICA. Until the patch
appears mainline may take a month or two. Until the distro of your
choice that makes use of the fix comes out might take half a year or
more...
And backporting ACPICA fixes to older kernels is currently not possible
as ACPICA patches appear in a big bunch of some thousand lines patches.
But this hopefully changes soon.

In my mind come:
- alias broken in certain cases
https://bugziall.novell.com/show_bug.cgi?id=113099
- recon amount of elements in packages
https://bugzilla.novell.com/show_bug.cgi?id=189488
- wrong offsets at Field and Operation Region declarations
-> should be compatible for quite a while now
- ...

3)
Debugging.
This is why at least compile in or via initrd must be provided in
mainline kernel IMHO. Intel people themselves ask the bug reporter to
override ACPI tables with a patched table to debug the system.
Do you really think ripping out all overriding functionality from the
kernel is a good idea?

Thomas

It is true that some users are happy with a fixed DSDT, even you tell
them to find the root cause..., but sooner or later they always come
back.

2006-08-07 17:57:27

by Greg KH

[permalink] [raw]
Subject: Re: Options depending on STANDALONE

On Mon, Aug 07, 2006 at 07:33:31PM +0200, Thomas Renninger wrote:
> On Thu, 2006-08-03 at 16:49 -0400, Brown, Len wrote:
> > >On Thu, Aug 03, 2006 at 10:25:43PM +0200, Adrian Bunk wrote:
> > >> ACPI_CUSTOM_DSDT seems to be the most interesting case.
> > >> It's anyway not usable for distribution kernels, and AFAIR the ACPI
> > >> people prefer to get the kernel working with all original DSDTs
> > >> (which usually work with at least one other OS) than letting
> > >> the people workaround the problem by using a custom DSDT.
> > >
> > >Not true at all. For SuSE kernels, we have a patch that lets people
> > >load a new DSDT from initramfs due to broken machines requiring a
> > >replacement in order to work properly.
> >
> > CONFIG_ACPI_CUSTOM_DSDT allows hackers to debug their system
> > by building a modified DSDT into the kernel to over-ride what
> > came with the system. It would make no sense for a distro
> > to use it, unless the distro were shipping only on 1 model machine.
> > This technique is necessary for debugging, but makes no
> > sense for production.
> >
> > The initramfs method shipped by SuSE is more flexible, allowing
> > the hacker to stick the DSDT image in the initrd and use it
> > without re-compiling the kernel.
> >
> > I have refused to accept the initrd patch into Linux many times,
> > and always will.
> >
> > I've advised SuSE many times that they should not be shipping it,
> > as it means that their supported OS is running on modified firmware --
> > which, by definition, they can not support.
> Tainting the kernel if done so should be sufficient.
> > Indeed, one could view
> > this method as couter-productive to the evolution of Linux --
> > since it is our stated goal to run on the same machines that Windows
> > runs on -- without requiring customers to modify those machines
> > to run Linux.
>
> There are three reasons for the initrd patch (last one also applies for
> the compile in functionality):

<snip>

Yeah, you and others within SuSE have convinced me to not drop this
patch from our kernel tree.

Sorry Len.

thanks,

greg k-h

2006-08-07 19:14:25

by Éric Piel

[permalink] [raw]
Subject: Re: Options depending on STANDALONE

08/07/2006 07:33 PM, Thomas Renninger wrote/a écrit:
>
> There are three reasons for the initrd patch (last one also applies for
> the compile in functionality):
Hi, I just happen to be the maintainer "this initrd patch" ;-) I agree
with you Thomas. IMHO, this patch is really useful in our "not so
perfect" world. Few more comments below:

>
> 1)
> There might be "BIOS bugs" that will never get fixed:
> https://bugzilla.novell.com/show_bug.cgi?id=160671
> (Because it's an obvious BIOS bug, "compatibility" fixing it could make
> things worse).
This is really feature #1, PC manufacturers come to sometimes extremely
ugly things when they code their ACPI tables. You can find lots of BIOS
containing in their ACPI tables tests like "do this if OS name is 13
letters long, and that if OS name is 11 letters long..." Obviously
Linux is most of the time not within those tests!

1.5) Feature adding. Some (crazy?) people are working on new
implementation of their ACPI table to add features (cf the "Smart
Battery System for Linux" project).

In those two cases, you really can't expect every user to recompile it's
Linux kernel to get an new DSDT table :-)

> 2)
> There might be "ACPICA/kernel bugs" that take a while until they get
> fixed:
>
> This happens often. There comes out a new machine, using AML in a
> slightly other way, we need to fix it in kernel/ACPICA. Until the patch
> appears mainline may take a month or two. Until the distro of your
> choice that makes use of the fix comes out might take half a year or
> more...
> And backporting ACPICA fixes to older kernels is currently not possible
> as ACPICA patches appear in a big bunch of some thousand lines patches.
> But this hopefully changes soon.
>
> In my mind come:
> - alias broken in certain cases
> https://bugziall.novell.com/show_bug.cgi?id=113099
> - recon amount of elements in packages
> https://bugzilla.novell.com/show_bug.cgi?id=189488
> - wrong offsets at Field and Operation Region declarations
> -> should be compatible for quite a while now
> - ...
Agree, although I believe of this as more an excuse than a reason.
Linux is still full of bugs, lots of which cannot be fixed by ACPI table
swapping anyway...

> 3)
> Debugging.
> This is why at least compile in or via initrd must be provided in
> mainline kernel IMHO. Intel people themselves ask the bug reporter to
> override ACPI tables with a patched table to debug the system.
> Do you really think ripping out all overriding functionality from the
> kernel is a good idea?
Well, I think even Len agree with this usage :-)

All in all, I'm really _not_ asking for inclusion of the patch in the
main tree. Just asking you not to think too much bad of the distros
which use this patch ;-) (IIRC, at least Mandriva and Ubuntu include it
in addition to SuSE)

See you,
Eric

2006-08-08 00:13:13

by Pavel Machek

[permalink] [raw]
Subject: Re: A proposal - binary

On Sun 2006-08-06 15:45:29, Zachary Amsden wrote:
> Pavel Machek wrote:
> >...it should be very easy to opensource simple 'something' layer. If
> >it is so complex it is 'hard' to opensource, it is missdesigned,
> >anyway... so fix the design.
> >
>
> It's not a design issue - it's a legal issue at this point, and one that
> I'm not qualified to come up with a good answer for. The biggest
> technical issue I think for open sourcing the VMI, is that it is not
> part of the kernel, but stand alone firmware with a rather bizarre build
> environment, so the code alone is not sufficient to allow it to be open
> sourced, but this is not a hard problem to solve.

Well, I guess we'd like VMI to be buildable in normal kernel build
tools ... and at that point, open sourcing it should be _really_ easy.

And we'd prefer legal decisions not to influence technical ones. Maybe
we will decide to use binary interface after all, but seeing GPLed,
easily-buildable interface, first, means we can look at both solutions
and decide which one is better.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-08-08 00:42:25

by Zachary Amsden

[permalink] [raw]
Subject: Re: A proposal - binary

Pavel Machek wrote:
> Well, I guess we'd like VMI to be buildable in normal kernel build
> tools ... and at that point, open sourcing it should be _really_ easy.
>
> And we'd prefer legal decisions not to influence technical ones. Maybe
> we will decide to use binary interface after all, but seeing GPLed,
> easily-buildable interface, first, means we can look at both solutions
> and decide which one is better.

I don't think you're actually arguing for the VMI ROM to be built into
the kernel. But since this could be a valid interpretation of what you
said, let me address that point so other readers of this thread don't
misinterpret.

On a purely technical level, the VMI layer must not be part of the
normal kernel build. It must be distributed by the hypervisor to which
it communicates. This is what provides hypervisor independence and
hardware compatibility, and why it can't be distributed with the
kernel. The kernel interfaces for VMI that are part of the kernel
proper are already completely open sourced and GPL'd. The piece in
question is the hypervisor specific VMI layer, which we have not yet
released an open source distribution of.

We do use standard tools for building it, for the most part - although
some perl scripting and elf munging magic is part of the build.
Finally, since it is a ROM, we have to use a post-build tool to convert
the extracted object to a ROM image and fix up the checksum. We don't
have a problem including any of those tools in an open source
distribution of the VMI ESX ROM once we finish sorting through the
license issues. We've already fixed most of the problems we had with
entangled header files so that we can create a buildable tarball that
requires only standard GNU compilers, elf tools, and perl to run. I
believe the only technical issue left is fixing the makefiles so that
building it doesn't require our rather complicated make system.

Hopefully we can have all this resolved soon so that you can build and
distribute your own ROM images, see how the code operates, and use the
base design framework as a blueprint for porting to other hypervisor
implementations, porting other operating systems, or just as a general
experimental layer that could be used for debugging or performance
instrumentation.

Zach

2006-08-08 10:56:30

by Thomas Renninger

[permalink] [raw]
Subject: RE: Options depending on STANDALONE

On Mon, 2006-08-07 at 19:33 +0200, Thomas Renninger wrote:
> On Thu, 2006-08-03 at 16:49 -0400, Brown, Len wrote:
> > >On Thu, Aug 03, 2006 at 10:25:43PM +0200, Adrian Bunk wrote:
> > >> ACPI_CUSTOM_DSDT seems to be the most interesting case.
> > >> It's anyway not usable for distribution kernels, and AFAIR the ACPI
> > >> people prefer to get the kernel working with all original DSDTs
> > >> (which usually work with at least one other OS) than letting
> > >> the people workaround the problem by using a custom DSDT.
> > >
> > >Not true at all. For SuSE kernels, we have a patch that lets people
> > >load a new DSDT from initramfs due to broken machines requiring a
> > >replacement in order to work properly.
> >
> > CONFIG_ACPI_CUSTOM_DSDT allows hackers to debug their system
> > by building a modified DSDT into the kernel to over-ride what
> > came with the system. It would make no sense for a distro
> > to use it, unless the distro were shipping only on 1 model machine.
> > This technique is necessary for debugging, but makes no
> > sense for production.
> >
> > The initramfs method shipped by SuSE is more flexible, allowing
> > the hacker to stick the DSDT image in the initrd and use it
> > without re-compiling the kernel.
> >
> > I have refused to accept the initrd patch into Linux many times,
> > and always will.
> >
> > I've advised SuSE many times that they should not be shipping it,
> > as it means that their supported OS is running on modified firmware --
> > which, by definition, they can not support.
> Tainting the kernel if done so should be sufficient.
> > Indeed, one could view
> > this method as couter-productive to the evolution of Linux --
> > since it is our stated goal to run on the same machines that Windows
> > runs on -- without requiring customers to modify those machines
> > to run Linux.
>
> There are three reasons for the initrd patch (last one also applies for
> the compile in functionality):
>
> 1)
> There might be "BIOS bugs" that will never get fixed:
> https://bugzilla.novell.com/show_bug.cgi?id=160671
> (Because it's an obvious BIOS bug, "compatibility" fixing it could make
> things worse).
>
> 2)
> There might be "ACPICA/kernel bugs" that take a while until they get
> fixed:
>
> This happens often. There comes out a new machine, using AML in a
> slightly other way, we need to fix it in kernel/ACPICA. Until the patch
> appears mainline may take a month or two. Until the distro of your
> choice that makes use of the fix comes out might take half a year or
> more...
> And backporting ACPICA fixes to older kernels is currently not possible
> as ACPICA patches appear in a big bunch of some thousand lines patches.
> But this hopefully changes soon.
>
> In my mind come:
> - alias broken in certain cases
> https://bugziall.novell.com/show_bug.cgi?id=113099
> - recon amount of elements in packages
> https://bugzilla.novell.com/show_bug.cgi?id=189488
> - wrong offsets at Field and Operation Region declarations
> -> should be compatible for quite a while now
> - ...
>
> 3)
> Debugging.
> This is why at least compile in or via initrd must be provided in
> mainline kernel IMHO. Intel people themselves ask the bug reporter to
> override ACPI tables with a patched table to debug the system.
> Do you really think ripping out all overriding functionality from the
> kernel is a good idea?

A last sentence...
I forgot the most important point that could make all others obsolete:
4)
Vendors don't care about Linux yet.
For laptops I know two vendors who eventually would fix their BIOS (for
special models, HP and Lenovo) and provide a BIOS update for customers.
If we could convince those to at least validate their BIOSes with Intel
ACPICA in some way, most stuff described in point 2 would not happen.
Hopefully Novell has more influence here than SUSE had to make at least
the big players take more care about Linux support. I think it's getting
better...

I hope those who never used ACPICA and ignored any "Could you please
switch this byte for us in AML code" cries from Linux customers will get
punished with incompatibility with newer M$ ACPI interpreters at some
time and will be forced to provide last minute updates and I hope it
hurts.

Thomas

2006-08-09 07:44:13

by Pavel Machek

[permalink] [raw]
Subject: Re: A proposal - binary

Hi!

> >Well, I guess we'd like VMI to be buildable in normal kernel build
> >tools ... and at that point, open sourcing it should be _really_ easy.
> >
> >And we'd prefer legal decisions not to influence technical ones. Maybe
> >we will decide to use binary interface after all, but seeing GPLed,
> >easily-buildable interface, first, means we can look at both solutions
> >and decide which one is better.
>
> I don't think you're actually arguing for the VMI ROM to be built into
> the kernel. But since this could be a valid interpretation of what you
> said, let me address that point so other readers of this thread don't
> misinterpret.

I actually was arguing for VMI ROM to be built into kernel. You have
pretty strong arguments why it will not work, but Xen is doing that,
and it would be at least very interesting to see how it works for
vmware. (And perhaps to decide that it does not work :-).

> On a purely technical level, the VMI layer must not be part of the
> normal kernel build. It must be distributed by the hypervisor to
> which

Oh yes, it can be part of kernel build. #ifdef vmware_version_3_0_4 is
ugly, but at least it would force you not to change the interfaces too
often, which might be good thing.

> We do use standard tools for building it, for the most part - although
> some perl scripting and elf munging magic is part of the build.
> Finally, since it is a ROM, we have to use a post-build tool to convert
> the extracted object to a ROM image and fix up the checksum. We don't
> have a problem including any of those tools in an open source
> distribution of the VMI ESX ROM once we finish sorting through the
> license issues. We've already fixed most of the problems we had with
> entangled header files so that we can create a buildable tarball that
> requires only standard GNU compilers, elf tools, and perl to run. I
> believe the only technical issue left is fixing the makefiles so that
> building it doesn't require our rather complicated make system.

Good, nice, so you are close. Now get us GPLed release ;-).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-08-13 16:36:06

by Adrian Bunk

[permalink] [raw]
Subject: Re: [v4l-dvb-maintainer] Options depending on STANDALONE

On Sun, Aug 06, 2006 at 01:18:59PM +0200, Oliver Endriss wrote:
> Adrian Bunk wrote:
> > On Thu, Aug 03, 2006 at 04:40:25PM -0700, Trent Piepho wrote:
> > > On Thu, 3 Aug 2006, Adrian Bunk wrote:
> > > > On Thu, Aug 03, 2006 at 03:56:17PM -0400, Dave Jones wrote:
> > > > > You're describing PREVENT_FIRMWARE_BUILD. The text Zach quoted is from
> > > > > STANDALONE, which is something else completely. That allows us to not
> > > > > build drivers that pull in things from /etc and the like during compile.
> > > > > (Whoever thought that was a good idea?)
> > > >
> > > > Is DVB_AV7110_FIRMWARE really still required?
> > > > ALL other drivers work without such an option.
> > >
> > > The other DVB drivers that need firmware load it when the device is opened
> > > or used (ie. a channel is tuned). At least for the ones I'm familiar
> > > with. If they are compiled directly into the kernel, they can still use
> > > FW_LOADER since the loading won't happen until utill well after booting is
> > > done.
> > >
> > > For AV7110, it looks like the firmware loading is done when the driver is
> > > first initialized. If AV7110 is compiled into the kernel, FW_LOADER can
> > > not be used. The filesystem with the firmware won't be mounted yet.
> > >
> > > So AV7110 has an option to compile a firmware file into the driver.
> >
> > But is there a technical reason why this has to be done this way?
> >
> > This is the onle (non-OSS) driver doing it this way, and Zach has a
> > point that this is legally questionable.
>
> This option _is_ useful because it allows allows a user to build an
> av7110 driver without hotplug etc. I NAK any attempt to remove it.

If you look at the dependencies of DVB_AV7110 and the code in av7110.c
you'll note that your statement "it allows allows a user to build an
av7110 driver without hotplug" is not true.

> Sorry, a kernel option cannot cause a legal issue. Only the user does.
> For non-distribution kernels there is no difference whether firmware is
> loaded at run-time or compiled-in.
>
> Obviously, there might be a difference for distribution kernels if you
> are not allowed to distribute the firmware (imho not a problem in this
> case, but IANAL). Simple solution: Do not enable the option.

The general direction in Linux kernel development is to load the
firmware at runtime.

> I have no problem if you want to remove STANDALONE: Simply remove the
> dependency to STANDALONE, but keep DVB_AV7110_FIRMWARE with default 'n'.

The point of STANDALONE are working allmodconfig/allyesconfig compiles.

Removing the dependency on STANDALONE therefore implies that it compiles.

> CU
> Oliver

cu
Adrian

--

Gentoo kernels are 42 times more popular than SUSE kernels among
KLive users (a service by SUSE contractor Andrea Arcangeli that
gathers data about kernels from many users worldwide).

There are three kinds of lies: Lies, Damn Lies, and Statistics.
Benjamin Disraeli

2006-08-14 21:16:08

by Trent Piepho

[permalink] [raw]
Subject: Re: [v4l-dvb-maintainer] Options depending on STANDALONE

On Sun, 13 Aug 2006, Adrian Bunk wrote:
> On Sun, Aug 06, 2006 at 01:18:59PM +0200, Oliver Endriss wrote:
> > Adrian Bunk wrote:
> > > On Thu, Aug 03, 2006 at 04:40:25PM -0700, Trent Piepho wrote:
> > > > On Thu, 3 Aug 2006, Adrian Bunk wrote:
> > > > > On Thu, Aug 03, 2006 at 03:56:17PM -0400, Dave Jones wrote:
> > > > > > You're describing PREVENT_FIRMWARE_BUILD. The text Zach quoted is from
> > > > > > STANDALONE, which is something else completely. That allows us to not
> > > > > > build drivers that pull in things from /etc and the like during compile.
> > > > > > (Whoever thought that was a good idea?)
> > > > >
> > > > > Is DVB_AV7110_FIRMWARE really still required?
> > > > > ALL other drivers work without such an option.
> > > >
> > > > The other DVB drivers that need firmware load it when the device is opened
> > > > or used (ie. a channel is tuned). At least for the ones I'm familiar
> > > > with. If they are compiled directly into the kernel, they can still use
> > > > FW_LOADER since the loading won't happen until utill well after booting is
> > > > done.
> > > >
> > > > For AV7110, it looks like the firmware loading is done when the driver is
> > > > first initialized. If AV7110 is compiled into the kernel, FW_LOADER can
> > > > not be used. The filesystem with the firmware won't be mounted yet.
> > > >
> > > > So AV7110 has an option to compile a firmware file into the driver.
> > >
> > > But is there a technical reason why this has to be done this way?

Is there another way to load firmware in a driver compiled into the kernel?

> > > This is the onle (non-OSS) driver doing it this way, and Zach has a
> > > point that this is legally questionable.

I know there are other DVB drivers that can have firmware compiled in
instead of using FW_LOADER. They just don't show that ability in Kconfig,
you have to edit the driver to enable compiled in firmware.

> > This option _is_ useful because it allows allows a user to build an
> > av7110 driver without hotplug etc. I NAK any attempt to remove it.
>
> If you look at the dependencies of DVB_AV7110 and the code in av7110.c
> you'll note that your statement "it allows allows a user to build an
> av7110 driver without hotplug" is not true.

Looks like a mistake in the Kconfig file:
- select FW_LOADER
+ select FW_LOADER if DVB_AV7110_FIRMWARE=n

2006-08-27 21:45:06

by Adrian Bunk

[permalink] [raw]
Subject: Re: [v4l-dvb-maintainer] Options depending on STANDALONE

On Mon, Aug 14, 2006 at 02:15:26PM -0700, Trent Piepho wrote:
> On Sun, 13 Aug 2006, Adrian Bunk wrote:
> > On Sun, Aug 06, 2006 at 01:18:59PM +0200, Oliver Endriss wrote:
> > > Adrian Bunk wrote:
> > > > On Thu, Aug 03, 2006 at 04:40:25PM -0700, Trent Piepho wrote:
> > > > > On Thu, 3 Aug 2006, Adrian Bunk wrote:
> > > > > > On Thu, Aug 03, 2006 at 03:56:17PM -0400, Dave Jones wrote:
> > > > > > > You're describing PREVENT_FIRMWARE_BUILD. The text Zach quoted is from
> > > > > > > STANDALONE, which is something else completely. That allows us to not
> > > > > > > build drivers that pull in things from /etc and the like during compile.
> > > > > > > (Whoever thought that was a good idea?)
> > > > > >
> > > > > > Is DVB_AV7110_FIRMWARE really still required?
> > > > > > ALL other drivers work without such an option.
> > > > >
> > > > > The other DVB drivers that need firmware load it when the device is opened
> > > > > or used (ie. a channel is tuned). At least for the ones I'm familiar
> > > > > with. If they are compiled directly into the kernel, they can still use
> > > > > FW_LOADER since the loading won't happen until utill well after booting is
> > > > > done.
> > > > >
> > > > > For AV7110, it looks like the firmware loading is done when the driver is
> > > > > first initialized. If AV7110 is compiled into the kernel, FW_LOADER can
> > > > > not be used. The filesystem with the firmware won't be mounted yet.
> > > > >
> > > > > So AV7110 has an option to compile a firmware file into the driver.
> > > >
> > > > But is there a technical reason why this has to be done this way?
>
> Is there another way to load firmware in a driver compiled into the kernel?

The CONFIG_DVB_AV7110_FIRMWARE=n code should work fine.

> > > > This is the onle (non-OSS) driver doing it this way, and Zach has a
> > > > point that this is legally questionable.
>
> I know there are other DVB drivers that can have firmware compiled in
> instead of using FW_LOADER. They just don't show that ability in Kconfig,
> you have to edit the driver to enable compiled in firmware.
>
> > > This option _is_ useful because it allows allows a user to build an
> > > av7110 driver without hotplug etc. I NAK any attempt to remove it.
> >
> > If you look at the dependencies of DVB_AV7110 and the code in av7110.c
> > you'll note that your statement "it allows allows a user to build an
> > av7110 driver without hotplug" is not true.
>
> Looks like a mistake in the Kconfig file:
> - select FW_LOADER
> + select FW_LOADER if DVB_AV7110_FIRMWARE=n

Sure, it could be fixed.

But the fact that it didn't work doesn't create a strong reason for
keeping it.

And the whole "kernel without hotplug" is anyway no longer possible in
the usual CONFIG_EMBEDDED=n case.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed