LinuxLists.cc - Re: [PATCH] SCSI driver for VMware's virtual HBA.

2009-09-01 11:12:25

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Mon, Aug 31, 2009 at 8:00 PM, James Bottomley
<[email protected]> wrote:
>
> On Mon, 2009-08-31 at 10:28 -0700, Alok Kataria wrote:
> > VMware PVSCSI driver - v2.
>
> OK, so the first thing that springs to mind is that we already have one
> of these things: the ibmvscsi ... is there no way we can share code
> between this and the other PV drivers?

Good question. But shouldn't the ibmvscsi driver be refactored before
considering sharing ibmvscsi code with other paravirtualized drivers ?
A quote from the ibmvscsi.c source code:

* TODO: This is currently pretty tied to the IBM i/pSeries hypervisor
* interfaces. It would be really nice to abstract this above an RDMA
* layer.

Splitting the ibmvscsi.c driver in an SRP initiator and an RDMA driver
would make the following possible:
- Reuse the existing SRP initiator (ib_srp). Currently there are two
SRP initiators present in the Linux kernel -- one that uses the RDMA
verbs API (ib_srp) and one that only works with IBM's i/pSeries
hypervisor (ibmvscsi).
- Reuse the ib_ipoib kernel module to provide an IP stack on top of
the new RDMA driver instead of having to maintain a separate network
driver for this hardware (ibmveth).

More information about the architecture the ibmvscsi and the ibmveth
drivers have been developed for can be found in the following paper:
D. Boutcher and D. Engebretsen, Linux Virtualization on IBM POWER5
Systems, Proceedings of the Linux Symposium, Vol. 1, July 2004, pp.
113-120 (http://www.kernel.org/doc/mirror/ols2004v1.pdf).

Bart.

2009-09-01 14:18:04

by James Bottomley

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Tue, 2009-09-01 at 13:12 +0200, Bart Van Assche wrote:
> On Mon, Aug 31, 2009 at 8:00 PM, James Bottomley
> <[email protected]> wrote:
> >
> > On Mon, 2009-08-31 at 10:28 -0700, Alok Kataria wrote:
> > > VMware PVSCSI driver - v2.
> >
> > OK, so the first thing that springs to mind is that we already have one
> > of these things: the ibmvscsi ... is there no way we can share code
> > between this and the other PV drivers?
>
> Good question. But shouldn't the ibmvscsi driver be refactored before
> considering sharing ibmvscsi code with other paravirtualized drivers ?

Not really, that would make it a chicken and egg problem. The question
was meant to direct attention to the issue of whether we should share
code for PV drivers or not. I think the answer to this one is yes; the
next thing is how to do it.

The one thing I'm not really keen on having is half a dozen totally
different virtual SCSI drivers for our half a dozen virtualisation
solutions. Apart from the coding waste, each will have new and
different bugs and a much smaller pool of users to find them.

The IBM vscsi operates slightly differently from the way newer PV
drivers may be expected to operate, but the SRP abstraction does look
like a reasonable one for a PV driver.

> A quote from the ibmvscsi.c source code:
>
> * TODO: This is currently pretty tied to the IBM i/pSeries hypervisor
> * interfaces. It would be really nice to abstract this above an RDMA
> * layer.
>
> Splitting the ibmvscsi.c driver in an SRP initiator and an RDMA driver
> would make the following possible:
> - Reuse the existing SRP initiator (ib_srp). Currently there are two
> SRP initiators present in the Linux kernel -- one that uses the RDMA
> verbs API (ib_srp) and one that only works with IBM's i/pSeries
> hypervisor (ibmvscsi).
> - Reuse the ib_ipoib kernel module to provide an IP stack on top of
> the new RDMA driver instead of having to maintain a separate network
> driver for this hardware (ibmveth).

So the RDMA piece is what I'm not sure about. For a protocol
abstraction, SRP makes a lot of sense. For a hypervisor interface, it's
not really clear that RDMA is the best way to go. In fact, some more
minimal DMA ring implementation seems to be the way most hypervisors are
set up, but it's still possible to run a nice SRP abstraction over them.

> More information about the architecture the ibmvscsi and the ibmveth
> drivers have been developed for can be found in the following paper:
> D. Boutcher and D. Engebretsen, Linux Virtualization on IBM POWER5
> Systems, Proceedings of the Linux Symposium, Vol. 1, July 2004, pp.
> 113-120 (http://www.kernel.org/doc/mirror/ols2004v1.pdf).

The other piece of this is that it's not clear that SCSI is actually the
best layer for this abstration. For a simple, fast storage interface,
nbd is probably the easiest abstraction to do (the disadvantage being
the lack of ioctl support, so it really only does storage).

James

2009-09-01 16:12:46

by Roland Dreier

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

> - Reuse the existing SRP initiator (ib_srp). Currently there are two
> SRP initiators present in the Linux kernel -- one that uses the RDMA
> verbs API (ib_srp) and one that only works with IBM's i/pSeries
> hypervisor (ibmvscsi).

This would be sane, although the difference in management APIs etc made
this seem like quite a bit of work when I looked at it (hence the
existence of both ibmvscsi and ib_srp).

> - Reuse the ib_ipoib kernel module to provide an IP stack on top of
> the new RDMA driver instead of having to maintain a separate network
> driver for this hardware (ibmveth).

I don't think this really makes sense, because IPoIB is not really
handling ethernet (it is a different L2 ethernet encapsulation), and I
think the commonality with ibmveth is going to be minimal.

I'm not really sure we should be trying to force drivers to share just
because they are paravirtualized -- if there is real commonality, then
sure put it in common code, but different hypervisors are probably as
different as different hardware.

- R.

2009-09-01 16:16:51

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Tue, Sep 01, 2009 at 09:12:43AM -0700, Roland Dreier wrote:
> I'm not really sure we should be trying to force drivers to share just
> because they are paravirtualized -- if there is real commonality, then
> sure put it in common code, but different hypervisors are probably as
> different as different hardware.

I really disagree. This kind of virtualised drivers are pretty much
communication protocols, and not hardware. As such, why design a new one?
If there's an infelicity in the ibmvscsi protocol, it makes sense to
design a new one. But being different for the sake of being different
is just a way to generate a huge amount of make-work.

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2009-09-01 16:33:50

by Dmitry Torokhov

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Tuesday 01 September 2009 09:16:51 am Matthew Wilcox wrote:
> On Tue, Sep 01, 2009 at 09:12:43AM -0700, Roland Dreier wrote:
> > I'm not really sure we should be trying to force drivers to share just
> > because they are paravirtualized -- if there is real commonality, then
> > sure put it in common code, but different hypervisors are probably as
> > different as different hardware.
>
> I really disagree. This kind of virtualised drivers are pretty much
> communication protocols, and not hardware. As such, why design a new one?
> If there's an infelicity in the ibmvscsi protocol, it makes sense to
> design a new one. But being different for the sake of being different
> is just a way to generate a huge amount of make-work.
>

The same thing can be said about pretty much anything. We don't have
single SCSI, network, etc driver handling every devices in their
respective class, I don't see why it would be different here.
A hypervisor presents the same interface to the guest OS (whether
it is Linux, Solaris or another OS) much like a piece of silicone
does and it may very well be different form other hypervisors.

--
Dmitry

2009-09-01 16:34:31

by Bart Van Assche

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Tue, Sep 1, 2009 at 6:12 PM, Roland Dreier <[email protected]> wrote:
> ?> - Reuse the ib_ipoib kernel module to provide an IP stack on top of
> ?> the new RDMA driver instead of having to maintain a separate network
> ?> driver for this hardware (ibmveth).
>
> I don't think this really makes sense, because IPoIB is not really
> handling ethernet (it is a different L2 ethernet encapsulation), and I
> think the commonality with ibmveth is going to be minimal.

What I had in mind was not to start searching for code shared between
the ipoib and ibmveth kernel modules, but to replace the virtual
Ethernet layer by IPoIB on top of a new RDMA driver. I'm not sure
however this approach would work better than the currently implemented
approach in ibmveth.

> I'm not really sure we should be trying to force drivers to share just
> because they are paravirtualized -- if there is real commonality, then
> sure put it in common code, but different hypervisors are probably as
> different as different hardware.

Agreed. But several people are currently looking at how to improve the
performance of I/O performed inside a virtual machine without being
familiar with the VIA architecture or the RDMA API. This is a pity
because the Virtual Interface Architecture was designed to allow
high-throughput low-latency I/O, and has some features that are not
present in any other mainstream I/O architecture I know of (e.g. the
ability to perform I/O from userspace without having to invoke any
system call in the performance-critical path).

Bart.

2009-09-01 16:52:15

by James Bottomley

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Tue, 2009-09-01 at 09:33 -0700, Dmitry Torokhov wrote:
> On Tuesday 01 September 2009 09:16:51 am Matthew Wilcox wrote:
> > On Tue, Sep 01, 2009 at 09:12:43AM -0700, Roland Dreier wrote:
> > > I'm not really sure we should be trying to force drivers to share just
> > > because they are paravirtualized -- if there is real commonality, then
> > > sure put it in common code, but different hypervisors are probably as
> > > different as different hardware.
> >
> > I really disagree. This kind of virtualised drivers are pretty much
> > communication protocols, and not hardware. As such, why design a new one?
> > If there's an infelicity in the ibmvscsi protocol, it makes sense to
> > design a new one. But being different for the sake of being different
> > is just a way to generate a huge amount of make-work.
> >
>
> The same thing can be said about pretty much anything. We don't have
> single SCSI, network, etc driver handling every devices in their
> respective class, I don't see why it would be different here.
> A hypervisor presents the same interface to the guest OS (whether
> it is Linux, Solaris or another OS) much like a piece of silicone
> does and it may very well be different form other hypervisors.

Nobody said you had to have the exact same driver for every hypervisor.
What people are suggesting is that we look at commonalities in the
interfaces both from a control plane point of view (transport class) and
from a code sharing point of view (libscsivirt). However, all the
hypervisor interfaces I've seen are basically DMA rings ... they really
do seem to be very similar across hypervisors, so it does seem there
could be a lot of shared commonality. I'm not going to insist on RDMA
emulation, but perhaps you lot should agree on what a guest to
hypervisor DMA interface looks like.

James

2009-09-01 16:59:34

by Alok Kataria

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Tue, 2009-09-01 at 09:52 -0700, James Bottomley wrote:
> On Tue, 2009-09-01 at 09:33 -0700, Dmitry Torokhov wrote:
> > On Tuesday 01 September 2009 09:16:51 am Matthew Wilcox wrote:
> > > On Tue, Sep 01, 2009 at 09:12:43AM -0700, Roland Dreier wrote:
> > > > I'm not really sure we should be trying to force drivers to share just
> > > > because they are paravirtualized -- if there is real commonality, then
> > > > sure put it in common code, but different hypervisors are probably as
> > > > different as different hardware.
> > >
> > > I really disagree. This kind of virtualised drivers are pretty much
> > > communication protocols, and not hardware. As such, why design a new one?
> > > If there's an infelicity in the ibmvscsi protocol, it makes sense to
> > > design a new one. But being different for the sake of being different
> > > is just a way to generate a huge amount of make-work.
> > >
> >
> > The same thing can be said about pretty much anything. We don't have
> > single SCSI, network, etc driver handling every devices in their
> > respective class, I don't see why it would be different here.
> > A hypervisor presents the same interface to the guest OS (whether
> > it is Linux, Solaris or another OS) much like a piece of silicone
> > does and it may very well be different form other hypervisors.
>
> Nobody said you had to have the exact same driver for every hypervisor.
> What people are suggesting is that we look at commonalities in the
> interfaces both from a control plane point of view (transport class) and
> from a code sharing point of view (libscsivirt). However, all the
> hypervisor interfaces I've seen are basically DMA rings ... they really
> do seem to be very similar across hypervisors, so it does seem there
> could be a lot of shared commonality. I'm not going to insist on RDMA
> emulation, but perhaps you lot should agree on what a guest to
> hypervisor DMA interface looks like.

Which is this other hypervisor driver that you are talking about,
ibmvscsi is using RDMA emulation and I don't think you mean that.

And anyways how large is the DMA code that we are worrying about here ?
Only about 300-400 LOC ? I don't think we might want to over-design for
such small gains.

Alok
>
> James
>
>

2009-09-01 17:25:13

by Roland Dreier

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

> Nobody said you had to have the exact same driver for every hypervisor.
> What people are suggesting is that we look at commonalities in the
> interfaces both from a control plane point of view (transport class) and
> from a code sharing point of view (libscsivirt). However, all the
> hypervisor interfaces I've seen are basically DMA rings ...

I don't think that's anything special about hypervisors though -- pretty
much all modern device interfaces are basically DMA rings, aren't they?
I'm definitely in favor of common code to handle commonality but on the
other hand I don't see what's so special about virtual devices vs. real
HW devices. One the one side we have VMware's closed hypervisor code
and on the other side we have vendor XYZ's closed RTL and firmware code.

- R.

2009-09-01 17:25:59

by James Bottomley

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Tue, 2009-09-01 at 09:59 -0700, Alok Kataria wrote:
> On Tue, 2009-09-01 at 09:52 -0700, James Bottomley wrote:
> > On Tue, 2009-09-01 at 09:33 -0700, Dmitry Torokhov wrote:
> > > On Tuesday 01 September 2009 09:16:51 am Matthew Wilcox wrote:
> > > > On Tue, Sep 01, 2009 at 09:12:43AM -0700, Roland Dreier wrote:
> > > > > I'm not really sure we should be trying to force drivers to share just
> > > > > because they are paravirtualized -- if there is real commonality, then
> > > > > sure put it in common code, but different hypervisors are probably as
> > > > > different as different hardware.
> > > >
> > > > I really disagree. This kind of virtualised drivers are pretty much
> > > > communication protocols, and not hardware. As such, why design a new one?
> > > > If there's an infelicity in the ibmvscsi protocol, it makes sense to
> > > > design a new one. But being different for the sake of being different
> > > > is just a way to generate a huge amount of make-work.
> > > >
> > >
> > > The same thing can be said about pretty much anything. We don't have
> > > single SCSI, network, etc driver handling every devices in their
> > > respective class, I don't see why it would be different here.
> > > A hypervisor presents the same interface to the guest OS (whether
> > > it is Linux, Solaris or another OS) much like a piece of silicone
> > > does and it may very well be different form other hypervisors.
> >
> > Nobody said you had to have the exact same driver for every hypervisor.
> > What people are suggesting is that we look at commonalities in the
> > interfaces both from a control plane point of view (transport class) and
> > from a code sharing point of view (libscsivirt). However, all the
> > hypervisor interfaces I've seen are basically DMA rings ... they really
> > do seem to be very similar across hypervisors, so it does seem there
> > could be a lot of shared commonality. I'm not going to insist on RDMA
> > emulation, but perhaps you lot should agree on what a guest to
> > hypervisor DMA interface looks like.
>
> Which is this other hypervisor driver that you are talking about,
> ibmvscsi is using RDMA emulation and I don't think you mean that.

lguest uses the sg_ring abstraction. Xen and KVM were certainly looking
at this too.

> And anyways how large is the DMA code that we are worrying about here ?
> Only about 300-400 LOC ? I don't think we might want to over-design for
> such small gains.

So even if you have different DMA code, the remaining thousand or so
lines would be in common. That's a worthwhile improvement.

The benefit to users would be a common control plane and interface from
the transport class, plus common code means more testers regardless of
virtualisation technology chosen.

James

2009-09-01 17:40:15

by James Bottomley

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Tue, 2009-09-01 at 10:25 -0700, Roland Dreier wrote:
> > Nobody said you had to have the exact same driver for every hypervisor.
> > What people are suggesting is that we look at commonalities in the
> > interfaces both from a control plane point of view (transport class) and
> > from a code sharing point of view (libscsivirt). However, all the
> > hypervisor interfaces I've seen are basically DMA rings ...
>
> I don't think that's anything special about hypervisors though -- pretty
> much all modern device interfaces are basically DMA rings, aren't they?
> I'm definitely in favor of common code to handle commonality but on the
> other hand I don't see what's so special about virtual devices vs. real
> HW devices. One the one side we have VMware's closed hypervisor code
> and on the other side we have vendor XYZ's closed RTL and firmware code.

But the main difference between actual hardware and hypervisors is the
fact that to set up a DMA transfer you have to poke registers on the
card, set up a mailbox and manage queues of commands to the card. For a
hypervisor, sending a DMA transaction is a hypercall.

Now for most physical drivers, take for example FCP ones, we have a
common control plane interface (fc transport class), we're evolving a
frame handling library (libfc) so all the drivers really have are
specific codes to bit bang the hardware. Some of the libfc handling is
actually done in intelligent offload firmware on the HBAs, so some will
use more or less of the libfc handling (same is true for SAS and
libsas). When there's no actual hardware to be bit banged, and no real
firmware offload, it does make one wonder what would be left unique to
the driver.

James

2009-09-01 17:41:16

by Alok Kataria

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Tue, 2009-09-01 at 10:25 -0700, James Bottomley wrote:
> On Tue, 2009-09-01 at 09:59 -0700, Alok Kataria wrote:
> > On Tue, 2009-09-01 at 09:52 -0700, James Bottomley wrote:
> > > On Tue, 2009-09-01 at 09:33 -0700, Dmitry Torokhov wrote:
> > > > On Tuesday 01 September 2009 09:16:51 am Matthew Wilcox wrote:
> > > > > On Tue, Sep 01, 2009 at 09:12:43AM -0700, Roland Dreier wrote:
> > > > > > I'm not really sure we should be trying to force drivers to share just
> > > > > > because they are paravirtualized -- if there is real commonality, then
> > > > > > sure put it in common code, but different hypervisors are probably as
> > > > > > different as different hardware.
> > > > >
> > > > > I really disagree. This kind of virtualised drivers are pretty much
> > > > > communication protocols, and not hardware. As such, why design a new one?
> > > > > If there's an infelicity in the ibmvscsi protocol, it makes sense to
> > > > > design a new one. But being different for the sake of being different
> > > > > is just a way to generate a huge amount of make-work.
> > > > >
> > > >
> > > > The same thing can be said about pretty much anything. We don't have
> > > > single SCSI, network, etc driver handling every devices in their
> > > > respective class, I don't see why it would be different here.
> > > > A hypervisor presents the same interface to the guest OS (whether
> > > > it is Linux, Solaris or another OS) much like a piece of silicone
> > > > does and it may very well be different form other hypervisors.
> > >
> > > Nobody said you had to have the exact same driver for every hypervisor.
> > > What people are suggesting is that we look at commonalities in the
> > > interfaces both from a control plane point of view (transport class) and
> > > from a code sharing point of view (libscsivirt). However, all the
> > > hypervisor interfaces I've seen are basically DMA rings ... they really
> > > do seem to be very similar across hypervisors, so it does seem there
> > > could be a lot of shared commonality. I'm not going to insist on RDMA
> > > emulation, but perhaps you lot should agree on what a guest to
> > > hypervisor DMA interface looks like.
> >
> > Which is this other hypervisor driver that you are talking about,
> > ibmvscsi is using RDMA emulation and I don't think you mean that.
>
> lguest uses the sg_ring abstraction. Xen and KVM were certainly looking
> at this too.

I don't see the sg_ring abstraction that you are talking about. Can you
please give me some pointers.
Also regarding Xen and KVM I think they are using the xenbus/vbus
interface, which is quite different than what we do here.

>
> > And anyways how large is the DMA code that we are worrying about here ?
> > Only about 300-400 LOC ? I don't think we might want to over-design for
> > such small gains.
>
> So even if you have different DMA code, the remaining thousand or so
> lines would be in common. That's a worthwhile improvement.

And not just that, different HV-vendors can have different features,
like say XYZ can come up tomorrow and implement the multiple rings
interface so the feature set doesn't remain common and we will have less
code to share in the not so distant future.

Thanks,
Alok
>
> The benefit to users would be a common control plane and interface from
> the transport class, plus common code means more testers regardless of
> virtualisation technology chosen.

>
> James
>
>

2009-09-01 17:54:35

by Alok Kataria

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Tue, 2009-09-01 at 10:40 -0700, James Bottomley wrote:
> On Tue, 2009-09-01 at 10:25 -0700, Roland Dreier wrote:
> > > Nobody said you had to have the exact same driver for every hypervisor.
> > > What people are suggesting is that we look at commonalities in the
> > > interfaces both from a control plane point of view (transport class) and
> > > from a code sharing point of view (libscsivirt). However, all the
> > > hypervisor interfaces I've seen are basically DMA rings ...
> >
> > I don't think that's anything special about hypervisors though -- pretty
> > much all modern device interfaces are basically DMA rings, aren't they?
> > I'm definitely in favor of common code to handle commonality but on the
> > other hand I don't see what's so special about virtual devices vs. real
> > HW devices. One the one side we have VMware's closed hypervisor code
> > and on the other side we have vendor XYZ's closed RTL and firmware code.
>
> But the main difference between actual hardware and hypervisors is the
> fact that to set up a DMA transfer you have to poke registers on the
> card, set up a mailbox and manage queues of commands to the card. For a
> hypervisor, sending a DMA transaction is a hypercall.

Not really, it depends on how you see it, VMware exports different IO
registers too which need to be bit banged to start some IO. So starting
an IO is not just a hypercall but a series of commands. Look at
pvscsi_kick_io, also the driver and the hypervisor code share the
request rings and completion rings which is quite similar to how a
command-queue is managed for a card.

Also note that, the way all these things are implemented for each of the
hypervisor devices will differ and getting every hv-vendor to agree on a
common set of things is not very attractive proposition ( atleast, this
I can say from my past experiences).

>
> Now for most physical drivers, take for example FCP ones, we have a
> common control plane interface (fc transport class), we're evolving a
> frame handling library (libfc) so all the drivers really have are
> specific codes to bit bang the hardware. Some of the libfc handling is
> actually done in intelligent offload firmware on the HBAs, so some will
> use more or less of the libfc handling (same is true for SAS and
> libsas). When there's no actual hardware to be bit banged, and no real
> firmware offload, it does make one wonder what would be left unique to
> the driver.
>
> James
>

Alok

2009-09-01 18:16:03

by James Bottomley

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Tue, 2009-09-01 at 10:41 -0700, Alok Kataria wrote:
> > lguest uses the sg_ring abstraction. Xen and KVM were certainly looking
> > at this too.
>
> I don't see the sg_ring abstraction that you are talking about. Can you
> please give me some pointers.

it's in drivers/lguest ... apparently it's vring now and the code is in
driver/virtio

> Also regarding Xen and KVM I think they are using the xenbus/vbus
> interface, which is quite different than what we do here.

Not sure about Xen ... KVM uses virtio above.

> >
> > > And anyways how large is the DMA code that we are worrying about here ?
> > > Only about 300-400 LOC ? I don't think we might want to over-design for
> > > such small gains.
> >
> > So even if you have different DMA code, the remaining thousand or so
> > lines would be in common. That's a worthwhile improvement.
>
> And not just that, different HV-vendors can have different features,
> like say XYZ can come up tomorrow and implement the multiple rings
> interface so the feature set doesn't remain common and we will have less
> code to share in the not so distant future.

Multiple rings is really just a multiqueue abstraction. That's fine,
but it needs a standard multiqueue control plane.

The desire to one up the competition by adding a new whiz bang feature
to which you code a special interface is very common in the storage
industry. The counter pressure is that consumers really like these
things standardised. That's what the transport class abstraction is all
about.

We also seem to be off on a tangent about hypervisor interfaces. I'm
actually more interested in the utility of an SRP abstraction or at
least something SAM based. It seems that in your driver you don't quite
do the task management functions as SAM requests, but do them over your
own protocol abstractions.

James

2009-09-01 18:38:47

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Tue, Sep 01, 2009 at 05:54:36PM +0000, Alok Kataria wrote:
> Also note that, the way all these things are implemented for each of the
> hypervisor devices will differ and getting every hv-vendor to agree on a
> common set of things is not very attractive proposition ( atleast, this
> I can say from my past experiences).

virtio is shared by three or four hypervisors depending on how you
count it, with others under development.

2009-09-02 02:55:41

by Alok Kataria

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Tue, 2009-09-01 at 11:15 -0700, James Bottomley wrote:
> On Tue, 2009-09-01 at 10:41 -0700, Alok Kataria wrote:
> > > lguest uses the sg_ring abstraction. Xen and KVM were certainly looking
> > > at this too.
> >
> > I don't see the sg_ring abstraction that you are talking about. Can you
> > please give me some pointers.
>
> it's in drivers/lguest ... apparently it's vring now and the code is in
> driver/virtio
>
> > Also regarding Xen and KVM I think they are using the xenbus/vbus
> > interface, which is quite different than what we do here.
>
> Not sure about Xen ... KVM uses virtio above.
>
> > >
> > > > And anyways how large is the DMA code that we are worrying about here ?
> > > > Only about 300-400 LOC ? I don't think we might want to over-design for
> > > > such small gains.
> > >
> > > So even if you have different DMA code, the remaining thousand or so
> > > lines would be in common. That's a worthwhile improvement.

I don't see how, the rest of the code comprises of IO/MMIO space & ring
processing which is very different in each of the implementations. What
is left is the setup and initialization code which obviously depends on
the implementation of the driver data structures.

> >
> > And not just that, different HV-vendors can have different features,
> > like say XYZ can come up tomorrow and implement the multiple rings
> > interface so the feature set doesn't remain common and we will have less
> > code to share in the not so distant future.
>
> Multiple rings is really just a multiqueue abstraction. That's fine,
> but it needs a standard multiqueue control plane.
>
> The desire to one up the competition by adding a new whiz bang feature
> to which you code a special interface is very common in the storage
> industry. The counter pressure is that consumers really like these
> things standardised. That's what the transport class abstraction is all
> about.
>
> We also seem to be off on a tangent about hypervisor interfaces. I'm
> actually more interested in the utility of an SRP abstraction or at
> least something SAM based. It seems that in your driver you don't quite
> do the task management functions as SAM requests, but do them over your
> own protocol abstractions.

Okay, I think I need to take a step back here and understand what
actually are you asking for.

1. What do you mean by the "transport class abstraction" ?
Do you mean that the way we communicate with the hypervisor needs to be
standardized ?

2. Are you saying that we should use the virtio ring mechanism to handle
our request and completion rings ?
We can not do that. Our backend expects that each slot on the ring is
in a particular format. Where as vring expects that each slot on the
vring is in the vring_desc format.

3. Also, the way we communicate with the hypervisor backend is that the
driver writes to our device IO registers in a particular format. The
format that we follow is to first write the command on the
COMMAND_REGISTER and then write a stream of data words in the
DATA_REGISTER, which is a normal device interface.
The reason I make this point is to highlight we are not making any
hypercalls instead we communicate with the hypervisor by writing to
IO/Memory mapped regions. So from that perspective the driver has no
knowledge that its is talking to a software backend (aka device
emulation) instead it is very similar to how a driver talks to a silicon
device. The backend expects things in a certain way and we cannot
really change that interface ( i.e. the ABI shared between Device driver
and Device Emulation).

So sharing code with vring or virtio is not something that works well
with our backend. The VMware PVSCSI driver is simply a virtual HBA and
shouldn't be looked at any differently.

Is their anything else that you are asking us to standardize ?

Thanks,
Alok
>
> James
>
>

2009-09-02 09:50:52

by Bart Van Assche

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Tue, Sep 1, 2009 at 8:38 PM, Christoph Hellwig<[email protected]> wrote:
> On Tue, Sep 01, 2009 at 05:54:36PM +0000, Alok Kataria wrote:
>> Also note that, the way all these things are implemented for each of the
>> hypervisor devices will differ and getting every hv-vendor to agree on a
>> common set of things is not very attractive proposition ( atleast, this
>> I can say from my past experiences).
>
> virtio is shared by three or four hypervisors depending on how you
> count it, with others under development.

Research is ongoing about how to reach higher throughput and lower
latency than what is possible with the virtio interface. See also Jake
Edge, AlacrityVM, August 5, 2009 (http://lwn.net/Articles/345296/).

Bart.

2009-09-02 15:06:17

by James Bottomley

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Tue, 2009-09-01 at 19:55 -0700, Alok Kataria wrote:
> On Tue, 2009-09-01 at 11:15 -0700, James Bottomley wrote:
> > On Tue, 2009-09-01 at 10:41 -0700, Alok Kataria wrote:
> > > > lguest uses the sg_ring abstraction. Xen and KVM were certainly looking
> > > > at this too.
> > >
> > > I don't see the sg_ring abstraction that you are talking about. Can you
> > > please give me some pointers.
> >
> > it's in drivers/lguest ... apparently it's vring now and the code is in
> > driver/virtio
> >
> > > Also regarding Xen and KVM I think they are using the xenbus/vbus
> > > interface, which is quite different than what we do here.
> >
> > Not sure about Xen ... KVM uses virtio above.
> >
> > > >
> > > > > And anyways how large is the DMA code that we are worrying about here ?
> > > > > Only about 300-400 LOC ? I don't think we might want to over-design for
> > > > > such small gains.
> > > >
> > > > So even if you have different DMA code, the remaining thousand or so
> > > > lines would be in common. That's a worthwhile improvement.
>
> I don't see how, the rest of the code comprises of IO/MMIO space & ring
> processing which is very different in each of the implementations. What
> is left is the setup and initialization code which obviously depends on
> the implementation of the driver data structures.

Are there benchmarks comparing the two approaches?

> > > And not just that, different HV-vendors can have different features,
> > > like say XYZ can come up tomorrow and implement the multiple rings
> > > interface so the feature set doesn't remain common and we will have less
> > > code to share in the not so distant future.
> >
> > Multiple rings is really just a multiqueue abstraction. That's fine,
> > but it needs a standard multiqueue control plane.
> >
> > The desire to one up the competition by adding a new whiz bang feature
> > to which you code a special interface is very common in the storage
> > industry. The counter pressure is that consumers really like these
> > things standardised. That's what the transport class abstraction is all
> > about.
> >
> > We also seem to be off on a tangent about hypervisor interfaces. I'm
> > actually more interested in the utility of an SRP abstraction or at
> > least something SAM based. It seems that in your driver you don't quite
> > do the task management functions as SAM requests, but do them over your
> > own protocol abstractions.
>
> Okay, I think I need to take a step back here and understand what
> actually are you asking for.
>
> 1. What do you mean by the "transport class abstraction" ?
> Do you mean that the way we communicate with the hypervisor needs to be
> standardized ?

Not really. Transport classes are designed to share code and provide a
uniform control plane when the underlying implementation is different.

> 2. Are you saying that we should use the virtio ring mechanism to handle
> our request and completion rings ?

That's an interesting question. Virtio is currently the standard linux
guest<=>hypervisor communication mechanism, but if you have comparative
benchmarks showing that virtual hardware emulation is faster, it doesn't
need to remain so.

> We can not do that. Our backend expects that each slot on the ring is
> in a particular format. Where as vring expects that each slot on the
> vring is in the vring_desc format.

Your backend is a software server, surely?

> 3. Also, the way we communicate with the hypervisor backend is that the
> driver writes to our device IO registers in a particular format. The
> format that we follow is to first write the command on the
> COMMAND_REGISTER and then write a stream of data words in the
> DATA_REGISTER, which is a normal device interface.
> The reason I make this point is to highlight we are not making any
> hypercalls instead we communicate with the hypervisor by writing to
> IO/Memory mapped regions. So from that perspective the driver has no
> knowledge that its is talking to a software backend (aka device
> emulation) instead it is very similar to how a driver talks to a silicon
> device. The backend expects things in a certain way and we cannot
> really change that interface ( i.e. the ABI shared between Device driver
> and Device Emulation).
>
> So sharing code with vring or virtio is not something that works well
> with our backend. The VMware PVSCSI driver is simply a virtual HBA and
> shouldn't be looked at any differently.
>
> Is their anything else that you are asking us to standardize ?

I'm not really asking you to standardise anything (yet). I was more
probing for why you hadn't included any of the SCSI control plane
interfaces and what lead you do produce a different design from the
current patterns in virtual I/O. I think what I'm hearing is "Because
we didn't look at how modern SCSI drivers are constructed" and "Because
we didn't look at how virtual I/O is currently done in Linux". That's
OK (it's depressingly familiar in drivers), but now we get to figure out
what, if anything, makes sense from a SCSI control plane to a hypervisor
interface and whether this approach to hypervisor interfaces is better
or worse than virtio.

James

2009-09-02 17:16:28

by Alok Kataria

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Wed, 2009-09-02 at 08:06 -0700, James Bottomley wrote:
> On Tue, 2009-09-01 at 19:55 -0700, Alok Kataria wrote:
> > On Tue, 2009-09-01 at 11:15 -0700, James Bottomley wrote:
> > > On Tue, 2009-09-01 at 10:41 -0700, Alok Kataria wrote:
> > > > > lguest uses the sg_ring abstraction. Xen and KVM were certainly looking
> > > > > at this too.
> > > >
> > > > I don't see the sg_ring abstraction that you are talking about. Can you
> > > > please give me some pointers.
> > >
> > > it's in drivers/lguest ... apparently it's vring now and the code is in
> > > driver/virtio
> > >
> > > > Also regarding Xen and KVM I think they are using the xenbus/vbus
> > > > interface, which is quite different than what we do here.
> > >
> > > Not sure about Xen ... KVM uses virtio above.
> > >
> > > > >
> > > > > > And anyways how large is the DMA code that we are worrying about here ?
> > > > > > Only about 300-400 LOC ? I don't think we might want to over-design for
> > > > > > such small gains.
> > > > >
> > > > > So even if you have different DMA code, the remaining thousand or so
> > > > > lines would be in common. That's a worthwhile improvement.
> >
> > I don't see how, the rest of the code comprises of IO/MMIO space & ring
> > processing which is very different in each of the implementations. What
> > is left is the setup and initialization code which obviously depends on
> > the implementation of the driver data structures.
>
> Are there benchmarks comparing the two approaches?

Benchmarks comparing what ?

>
> > > > And not just that, different HV-vendors can have different features,
> > > > like say XYZ can come up tomorrow and implement the multiple rings
> > > > interface so the feature set doesn't remain common and we will have less
> > > > code to share in the not so distant future.
> > >
> > > Multiple rings is really just a multiqueue abstraction. That's fine,
> > > but it needs a standard multiqueue control plane.
> > >
> > > The desire to one up the competition by adding a new whiz bang feature
> > > to which you code a special interface is very common in the storage
> > > industry. The counter pressure is that consumers really like these
> > > things standardised. That's what the transport class abstraction is all
> > > about.
> > >
> > > We also seem to be off on a tangent about hypervisor interfaces. I'm
> > > actually more interested in the utility of an SRP abstraction or at
> > > least something SAM based. It seems that in your driver you don't quite
> > > do the task management functions as SAM requests, but do them over your
> > > own protocol abstractions.
> >
> > Okay, I think I need to take a step back here and understand what
> > actually are you asking for.
> >
> > 1. What do you mean by the "transport class abstraction" ?
> > Do you mean that the way we communicate with the hypervisor needs to be
> > standardized ?
>
> Not really. Transport classes are designed to share code and provide a
> uniform control plane when the underlying implementation is different.
>
> > 2. Are you saying that we should use the virtio ring mechanism to handle
> > our request and completion rings ?
>
> That's an interesting question. Virtio is currently the standard linux
> guest<=>hypervisor communication mechanism, but if you have comparative
> benchmarks showing that virtual hardware emulation is faster, it doesn't
> need to remain so.

It is a standard that KVM and lguest are using. I don't think it needs
any benchamrks to show if a particular approach is faster or not.
VMware has supported paravirtualized devices in backend for more than an
year now (may be more, don't quote me on this), and the backend is
common across different guest OS's. Virtual hardware emulation helps us
give a common interface to different GOS's, whereas virtio binds this
heavily to Linux usage. And please note that the backend implementation
for our virtual device was done before virtio was integrated in
mainline.

Also, from your statements above it seems that you think we are
proposing to change the standard communication mechanism (between guest
& hypervisor) for Linux. For the record that's not the case, the
standard that the Linux based VM's are using does not need to be
changed. This pvscsi driver is used for a new SCSI HBA, how does it
matter if this SCSI HBA is actually a virtual HBA and implemented by the
hypervisor in software.

>
> > We can not do that. Our backend expects that each slot on the ring is
> > in a particular format. Where as vring expects that each slot on the
> > vring is in the vring_desc format.
>
> Your backend is a software server, surely?

Yes it is, but the backend is as good as written in stone, as it is
being supported by our various products which are out in the market. The
pvscsi driver that I proposed for mainlining has also been in existence
for some time now and was being used/tested heavily. Earlier we used to
distribute it as part of our open-vm-tools project, and it is now that
we are proposing to integrate it with mainline.

So if you are hinting that since the backend is software, it can be
changed the answer is no. The reason being, their are existing
implementations that have that device support and we still want newer
guests to make use of that backend implementation.

> > 3. Also, the way we communicate with the hypervisor backend is that the
> > driver writes to our device IO registers in a particular format. The
> > format that we follow is to first write the command on the
> > COMMAND_REGISTER and then write a stream of data words in the
> > DATA_REGISTER, which is a normal device interface.
> > The reason I make this point is to highlight we are not making any
> > hypercalls instead we communicate with the hypervisor by writing to
> > IO/Memory mapped regions. So from that perspective the driver has no
> > knowledge that its is talking to a software backend (aka device
> > emulation) instead it is very similar to how a driver talks to a silicon
> > device. The backend expects things in a certain way and we cannot
> > really change that interface ( i.e. the ABI shared between Device driver
> > and Device Emulation).
> >
> > So sharing code with vring or virtio is not something that works well
> > with our backend. The VMware PVSCSI driver is simply a virtual HBA and
> > shouldn't be looked at any differently.
> >
> > Is their anything else that you are asking us to standardize ?
>
> I'm not really asking you to standardise anything (yet). I was more
> probing for why you hadn't included any of the SCSI control plane
> interfaces and what lead you do produce a different design from the
> current patterns in virtual I/O. I think what I'm hearing is "Because
> we didn't look at how modern SCSI drivers are constructed" and "Because
> we didn't look at how virtual I/O is currently done in Linux". That's
> OK (it's depressingly familiar in drivers),

I am sorry that's not the case, the reason we have different design as I
have mentioned above is because we want a generic mechanism which works
for all/most of the GOS's out their and doesn't need to be specific to
Linux.

> but now we get to figure out
> what, if anything, makes sense from a SCSI control plane to a hypervisor
> interface and whether this approach to hypervisor interfaces is better
> or worse than virtio.

I guess these points are answered above. Let me know if their is still
something amiss.

Thanks,
Alok

>
> James
>
>

2009-09-03 20:03:16

by James Bottomley

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Wed, 2009-09-02 at 10:16 -0700, Alok Kataria wrote:
> On Wed, 2009-09-02 at 08:06 -0700, James Bottomley wrote:
> > On Tue, 2009-09-01 at 19:55 -0700, Alok Kataria wrote:
> > > On Tue, 2009-09-01 at 11:15 -0700, James Bottomley wrote:
> > > > On Tue, 2009-09-01 at 10:41 -0700, Alok Kataria wrote:
> > > > > > lguest uses the sg_ring abstraction. Xen and KVM were certainly looking
> > > > > > at this too.
> > > > >
> > > > > I don't see the sg_ring abstraction that you are talking about. Can you
> > > > > please give me some pointers.
> > > >
> > > > it's in drivers/lguest ... apparently it's vring now and the code is in
> > > > driver/virtio
> > > >
> > > > > Also regarding Xen and KVM I think they are using the xenbus/vbus
> > > > > interface, which is quite different than what we do here.
> > > >
> > > > Not sure about Xen ... KVM uses virtio above.
> > > >
> > > > > >
> > > > > > > And anyways how large is the DMA code that we are worrying about here ?
> > > > > > > Only about 300-400 LOC ? I don't think we might want to over-design for
> > > > > > > such small gains.
> > > > > >
> > > > > > So even if you have different DMA code, the remaining thousand or so
> > > > > > lines would be in common. That's a worthwhile improvement.
> > >
> > > I don't see how, the rest of the code comprises of IO/MMIO space & ring
> > > processing which is very different in each of the implementations. What
> > > is left is the setup and initialization code which obviously depends on
> > > the implementation of the driver data structures.
> >
> > Are there benchmarks comparing the two approaches?
>
> Benchmarks comparing what ?

Your approach versus virtio.

> >
> > > > > And not just that, different HV-vendors can have different features,
> > > > > like say XYZ can come up tomorrow and implement the multiple rings
> > > > > interface so the feature set doesn't remain common and we will have less
> > > > > code to share in the not so distant future.
> > > >
> > > > Multiple rings is really just a multiqueue abstraction. That's fine,
> > > > but it needs a standard multiqueue control plane.
> > > >
> > > > The desire to one up the competition by adding a new whiz bang feature
> > > > to which you code a special interface is very common in the storage
> > > > industry. The counter pressure is that consumers really like these
> > > > things standardised. That's what the transport class abstraction is all
> > > > about.
> > > >
> > > > We also seem to be off on a tangent about hypervisor interfaces. I'm
> > > > actually more interested in the utility of an SRP abstraction or at
> > > > least something SAM based. It seems that in your driver you don't quite
> > > > do the task management functions as SAM requests, but do them over your
> > > > own protocol abstractions.
> > >
> > > Okay, I think I need to take a step back here and understand what
> > > actually are you asking for.
> > >
> > > 1. What do you mean by the "transport class abstraction" ?
> > > Do you mean that the way we communicate with the hypervisor needs to be
> > > standardized ?
> >
> > Not really. Transport classes are designed to share code and provide a
> > uniform control plane when the underlying implementation is different.
> >
> > > 2. Are you saying that we should use the virtio ring mechanism to handle
> > > our request and completion rings ?
> >
> > That's an interesting question. Virtio is currently the standard linux
> > guest<=>hypervisor communication mechanism, but if you have comparative
> > benchmarks showing that virtual hardware emulation is faster, it doesn't
> > need to remain so.
>
> It is a standard that KVM and lguest are using. I don't think it needs
> any benchamrks to show if a particular approach is faster or not.

It's a useful datapoint especially since the whole object of
paravirtualised drivers is supposed to be speed vs full hardware
emulation.

> VMware has supported paravirtualized devices in backend for more than an
> year now (may be more, don't quote me on this), and the backend is
> common across different guest OS's. Virtual hardware emulation helps us
> give a common interface to different GOS's, whereas virtio binds this
> heavily to Linux usage. And please note that the backend implementation
> for our virtual device was done before virtio was integrated in
> mainline.

Virtio mainline integration dates from October 2007. The mailing list
discussions obviously predate that by several months.

> Also, from your statements above it seems that you think we are
> proposing to change the standard communication mechanism (between guest
> & hypervisor) for Linux. For the record that's not the case, the
> standard that the Linux based VM's are using does not need to be
> changed. This pvscsi driver is used for a new SCSI HBA, how does it
> matter if this SCSI HBA is actually a virtual HBA and implemented by the
> hypervisor in software.
>
> >
> > > We can not do that. Our backend expects that each slot on the ring is
> > > in a particular format. Where as vring expects that each slot on the
> > > vring is in the vring_desc format.
> >
> > Your backend is a software server, surely?
>
> Yes it is, but the backend is as good as written in stone, as it is
> being supported by our various products which are out in the market. The
> pvscsi driver that I proposed for mainlining has also been in existence
> for some time now and was being used/tested heavily. Earlier we used to
> distribute it as part of our open-vm-tools project, and it is now that
> we are proposing to integrate it with mainline.
>
> So if you are hinting that since the backend is software, it can be
> changed the answer is no. The reason being, their are existing
> implementations that have that device support and we still want newer
> guests to make use of that backend implementation.
>
> > > 3. Also, the way we communicate with the hypervisor backend is that the
> > > driver writes to our device IO registers in a particular format. The
> > > format that we follow is to first write the command on the
> > > COMMAND_REGISTER and then write a stream of data words in the
> > > DATA_REGISTER, which is a normal device interface.
> > > The reason I make this point is to highlight we are not making any
> > > hypercalls instead we communicate with the hypervisor by writing to
> > > IO/Memory mapped regions. So from that perspective the driver has no
> > > knowledge that its is talking to a software backend (aka device
> > > emulation) instead it is very similar to how a driver talks to a silicon
> > > device. The backend expects things in a certain way and we cannot
> > > really change that interface ( i.e. the ABI shared between Device driver
> > > and Device Emulation).
> > >
> > > So sharing code with vring or virtio is not something that works well
> > > with our backend. The VMware PVSCSI driver is simply a virtual HBA and
> > > shouldn't be looked at any differently.
> > >
> > > Is their anything else that you are asking us to standardize ?
> >
> > I'm not really asking you to standardise anything (yet). I was more
> > probing for why you hadn't included any of the SCSI control plane
> > interfaces and what lead you do produce a different design from the
> > current patterns in virtual I/O. I think what I'm hearing is "Because
> > we didn't look at how modern SCSI drivers are constructed" and "Because
> > we didn't look at how virtual I/O is currently done in Linux". That's
> > OK (it's depressingly familiar in drivers),
>
> I am sorry that's not the case, the reason we have different design as I
> have mentioned above is because we want a generic mechanism which works
> for all/most of the GOS's out their and doesn't need to be specific to
> Linux.

Slightly confused now ... you're saying you did look at the transport
class and virtio? But you chose not to do a virtio like interface (for
reasons which I'm still not clear on) ... I didn't manage to extract
anything about why no transport class from the foregoing.

James

> > but now we get to figure out
> > what, if anything, makes sense from a SCSI control plane to a hypervisor
> > interface and whether this approach to hypervisor interfaces is better
> > or worse than virtio.
>
> I guess these points are answered above. Let me know if their is still
> something amiss.
>
> Thanks,
> Alok
>
> >
> > James
> >
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-09-03 20:31:02

by Dmitry Torokhov

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Thursday 03 September 2009 01:03:02 pm James Bottomley wrote:
> > >
> > > I'm not really asking you to standardise anything (yet). I was more
> > > probing for why you hadn't included any of the SCSI control plane
> > > interfaces and what lead you do produce a different design from the
> > > current patterns in virtual I/O. I think what I'm hearing is "Because
> > > we didn't look at how modern SCSI drivers are constructed" and "Because
> > > we didn't look at how virtual I/O is currently done in Linux". That's
> > > OK (it's depressingly familiar in drivers),
> >
> > I am sorry that's not the case, the reason we have different design as I
> > have mentioned above is because we want a generic mechanism which works
> > for all/most of the GOS's out their and doesn't need to be specific to
> > Linux.
>
> Slightly confused now ... you're saying you did look at the transport
> class and virtio? But you chose not to do a virtio like interface (for
> reasons which I'm still not clear on) ...

Virtio is Linux-specific and is not available on older kernels which
our hypervisor/PVSCSI combination does support. Even if we were to use
virtio-like schema in the hypervisor code we would have to re-implement
much of the virtio code for kernels earlier than those shipped in '07
and do the same for other operating systems for no apparent benefit.
The PCI device abstraction is self-contained and works well on Windows,
Linux and other guest operating systems and so it was chosen.

--
Dmitry

2009-09-03 21:27:45

by Ric Wheeler

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On 09/03/2009 04:31 PM, Dmitry Torokhov wrote:
> On Thursday 03 September 2009 01:03:02 pm James Bottomley wrote:
>
>>>> I'm not really asking you to standardise anything (yet). I was more
>>>> probing for why you hadn't included any of the SCSI control plane
>>>> interfaces and what lead you do produce a different design from the
>>>> current patterns in virtual I/O. I think what I'm hearing is "Because
>>>> we didn't look at how modern SCSI drivers are constructed" and "Because
>>>> we didn't look at how virtual I/O is currently done in Linux". That's
>>>> OK (it's depressingly familiar in drivers),
>>>>
>>> I am sorry that's not the case, the reason we have different design as I
>>> have mentioned above is because we want a generic mechanism which works
>>> for all/most of the GOS's out their and doesn't need to be specific to
>>> Linux.
>>>
>> Slightly confused now ... you're saying you did look at the transport
>> class and virtio? But you chose not to do a virtio like interface (for
>> reasons which I'm still not clear on) ...
>>
> Virtio is Linux-specific and is not available on older kernels which
> our hypervisor/PVSCSI combination does support. Even if we were to use
> virtio-like schema in the hypervisor code we would have to re-implement
> much of the virtio code for kernels earlier than those shipped in '07
> and do the same for other operating systems for no apparent benefit.
> The PCI device abstraction is self-contained and works well on Windows,
> Linux and other guest operating systems and so it was chosen.
>
>

Several arguments have a history of never winning when you try to get a
new bit of code in linux.

Number one in the bad justifications is that your design is good because
it avoids being "linux specific" closely followed by needing to backport :-)

ric

2009-09-03 21:40:59

by Dmitry Torokhov

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Thursday 03 September 2009 02:21:52 pm Ric Wheeler wrote:
> On 09/03/2009 04:31 PM, Dmitry Torokhov wrote:
> > On Thursday 03 September 2009 01:03:02 pm James Bottomley wrote:
> >
> >>>> I'm not really asking you to standardise anything (yet). I was more
> >>>> probing for why you hadn't included any of the SCSI control plane
> >>>> interfaces and what lead you do produce a different design from the
> >>>> current patterns in virtual I/O. I think what I'm hearing is "Because
> >>>> we didn't look at how modern SCSI drivers are constructed" and "Because
> >>>> we didn't look at how virtual I/O is currently done in Linux". That's
> >>>> OK (it's depressingly familiar in drivers),
> >>>>
> >>> I am sorry that's not the case, the reason we have different design as I
> >>> have mentioned above is because we want a generic mechanism which works
> >>> for all/most of the GOS's out their and doesn't need to be specific to
> >>> Linux.
> >>>
> >> Slightly confused now ... you're saying you did look at the transport
> >> class and virtio? But you chose not to do a virtio like interface (for
> >> reasons which I'm still not clear on) ...
> >>
> > Virtio is Linux-specific and is not available on older kernels which
> > our hypervisor/PVSCSI combination does support. Even if we were to use
> > virtio-like schema in the hypervisor code we would have to re-implement
> > much of the virtio code for kernels earlier than those shipped in '07
> > and do the same for other operating systems for no apparent benefit.
> > The PCI device abstraction is self-contained and works well on Windows,
> > Linux and other guest operating systems and so it was chosen.
> >
> >
>
> Several arguments have a history of never winning when you try to get a
> new bit of code in linux.
>
> Number one in the bad justifications is that your design is good because
> it avoids being "linux specific" closely followed by needing to backport :-)

That is true when you talking about implementing particular kernel features.
Here however our discussion shifted into the realm of how and why we
implemented the back-end the way we did and in this particular case argument
of being OS- and kernel-agnostic is a valid one.

The device is being presented to the guest operating system as a PCI device
with particular functionality and is being handled as an ordinary device
implemented in silicon. You may argue whether it was the optimal solution or
whether there are better ways to do the same but the device is in the wild
and is to stay.

--
Dmitry

2009-09-04 03:28:35

by Alok Kataria

[permalink] [raw]

Subject: Re: [PATCH] SCSI driver for VMware's virtual HBA.

On Thu, 2009-09-03 at 13:03 -0700, James Bottomley wrote:
> On Wed, 2009-09-02 at 10:16 -0700, Alok Kataria wrote:
> > On Wed, 2009-09-02 at 08:06 -0700, James Bottomley wrote:
> > > On Tue, 2009-09-01 at 19:55 -0700, Alok Kataria wrote:
> > > > On Tue, 2009-09-01 at 11:15 -0700, James Bottomley wrote:
> > > > > On Tue, 2009-09-01 at 10:41 -0700, Alok Kataria wrote:
> > > > > > > lguest uses the sg_ring abstraction. Xen and KVM were certainly looking
> > > > > > > at this too.
> > > > > >
> > > > > > I don't see the sg_ring abstraction that you are talking about. Can you
> > > > > > please give me some pointers.
> > > > >
> > > > > it's in drivers/lguest ... apparently it's vring now and the code is in
> > > > > driver/virtio
> > > > >
> > > > > > Also regarding Xen and KVM I think they are using the xenbus/vbus
> > > > > > interface, which is quite different than what we do here.
> > > > >
> > > > > Not sure about Xen ... KVM uses virtio above.
> > > > >
> > > > > > >
> > > > > > > > And anyways how large is the DMA code that we are worrying about here ?
> > > > > > > > Only about 300-400 LOC ? I don't think we might want to over-design for
> > > > > > > > such small gains.
> > > > > > >
> > > > > > > So even if you have different DMA code, the remaining thousand or so
> > > > > > > lines would be in common. That's a worthwhile improvement.
> > > >
> > > > I don't see how, the rest of the code comprises of IO/MMIO space & ring
> > > > processing which is very different in each of the implementations. What
> > > > is left is the setup and initialization code which obviously depends on
> > > > the implementation of the driver data structures.
> > >
> > > Are there benchmarks comparing the two approaches?
> >
> > Benchmarks comparing what ?
>
> Your approach versus virtio.
>
> > >
> > > > > > And not just that, different HV-vendors can have different features,
> > > > > > like say XYZ can come up tomorrow and implement the multiple rings
> > > > > > interface so the feature set doesn't remain common and we will have less
> > > > > > code to share in the not so distant future.
> > > > >
> > > > > Multiple rings is really just a multiqueue abstraction. That's fine,
> > > > > but it needs a standard multiqueue control plane.
> > > > >
> > > > > The desire to one up the competition by adding a new whiz bang feature
> > > > > to which you code a special interface is very common in the storage
> > > > > industry. The counter pressure is that consumers really like these
> > > > > things standardised. That's what the transport class abstraction is all
> > > > > about.
> > > > >
> > > > > We also seem to be off on a tangent about hypervisor interfaces. I'm
> > > > > actually more interested in the utility of an SRP abstraction or at
> > > > > least something SAM based. It seems that in your driver you don't quite
> > > > > do the task management functions as SAM requests, but do them over your
> > > > > own protocol abstractions.
> > > >
> > > > Okay, I think I need to take a step back here and understand what
> > > > actually are you asking for.
> > > >
> > > > 1. What do you mean by the "transport class abstraction" ?
> > > > Do you mean that the way we communicate with the hypervisor needs to be
> > > > standardized ?
> > >
> > > Not really. Transport classes are designed to share code and provide a
> > > uniform control plane when the underlying implementation is different.
> > >
> > > > 2. Are you saying that we should use the virtio ring mechanism to handle
> > > > our request and completion rings ?
> > >
> > > That's an interesting question. Virtio is currently the standard linux
> > > guest<=>hypervisor communication mechanism, but if you have comparative
> > > benchmarks showing that virtual hardware emulation is faster, it doesn't
> > > need to remain so.
> >
> > It is a standard that KVM and lguest are using. I don't think it needs
> > any benchamrks to show if a particular approach is faster or not.
>
> It's a useful datapoint especially since the whole object of
> paravirtualised drivers is supposed to be speed vs full hardware
> emulation.
>
> > VMware has supported paravirtualized devices in backend for more than an
> > year now (may be more, don't quote me on this), and the backend is
> > common across different guest OS's. Virtual hardware emulation helps us
> > give a common interface to different GOS's, whereas virtio binds this
> > heavily to Linux usage. And please note that the backend implementation
> > for our virtual device was done before virtio was integrated in
> > mainline.
>
> Virtio mainline integration dates from October 2007. The mailing list
> discussions obviously predate that by several months.
>
> > Also, from your statements above it seems that you think we are
> > proposing to change the standard communication mechanism (between guest
> > & hypervisor) for Linux. For the record that's not the case, the
> > standard that the Linux based VM's are using does not need to be
> > changed. This pvscsi driver is used for a new SCSI HBA, how does it
> > matter if this SCSI HBA is actually a virtual HBA and implemented by the
> > hypervisor in software.
> >
> > >
> > > > We can not do that. Our backend expects that each slot on the ring is
> > > > in a particular format. Where as vring expects that each slot on the
> > > > vring is in the vring_desc format.
> > >
> > > Your backend is a software server, surely?
> >
> > Yes it is, but the backend is as good as written in stone, as it is
> > being supported by our various products which are out in the market. The
> > pvscsi driver that I proposed for mainlining has also been in existence
> > for some time now and was being used/tested heavily. Earlier we used to
> > distribute it as part of our open-vm-tools project, and it is now that
> > we are proposing to integrate it with mainline.
> >
> > So if you are hinting that since the backend is software, it can be
> > changed the answer is no. The reason being, their are existing
> > implementations that have that device support and we still want newer
> > guests to make use of that backend implementation.
> >
> > > > 3. Also, the way we communicate with the hypervisor backend is that the
> > > > driver writes to our device IO registers in a particular format. The
> > > > format that we follow is to first write the command on the
> > > > COMMAND_REGISTER and then write a stream of data words in the
> > > > DATA_REGISTER, which is a normal device interface.
> > > > The reason I make this point is to highlight we are not making any
> > > > hypercalls instead we communicate with the hypervisor by writing to
> > > > IO/Memory mapped regions. So from that perspective the driver has no
> > > > knowledge that its is talking to a software backend (aka device
> > > > emulation) instead it is very similar to how a driver talks to a silicon
> > > > device. The backend expects things in a certain way and we cannot
> > > > really change that interface ( i.e. the ABI shared between Device driver
> > > > and Device Emulation).
> > > >
> > > > So sharing code with vring or virtio is not something that works well
> > > > with our backend. The VMware PVSCSI driver is simply a virtual HBA and
> > > > shouldn't be looked at any differently.
> > > >
> > > > Is their anything else that you are asking us to standardize ?
> > >
> > > I'm not really asking you to standardise anything (yet). I was more
> > > probing for why you hadn't included any of the SCSI control plane
> > > interfaces and what lead you do produce a different design from the
> > > current patterns in virtual I/O. I think what I'm hearing is "Because
> > > we didn't look at how modern SCSI drivers are constructed" and "Because
> > > we didn't look at how virtual I/O is currently done in Linux". That's
> > > OK (it's depressingly familiar in drivers),
> >
> > I am sorry that's not the case, the reason we have different design as I
> > have mentioned above is because we want a generic mechanism which works
> > for all/most of the GOS's out their and doesn't need to be specific to
> > Linux.
>
> Slightly confused now ... you're saying you did look at the transport
> class and virtio? But you chose not to do a virtio like interface (for
> reasons which I'm still not clear on) ...

Dmitry has answered all of these questions. So let me skip these.

> I didn't manage to extract
> anything about why no transport class from the foregoing.

I still don't understand about the transport class requirement.
I don't see how it will benefit either VMware's driver or other Linux
SCSI code. Nor do I understand how it helps reducing the code either.

My point is that even if we abstract the transport protocol code the
rest of the device implementation is still going to remain different for
each virtualized solution.

If you don't agree, can you please be a little more explicit and explain
what exactly are you asking for ?

--Alok
>
> James
>
> > > but now we get to figure out
> > > what, if anything, makes sense from a SCSI control plane to a hypervisor
> > > interface and whether this approach to hypervisor interfaces is better
> > > or worse than virtio.
> >
> > I guess these points are answered above. Let me know if their is still
> > something amiss.
> >
> > Thanks,
> > Alok
> >
> > >
> > > James
> > >
> > >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>