2015-11-19 13:45:52

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v3 0/3] virtio DMA API core stuff

On Tue, Oct 27, 2015 at 11:38:57PM -0700, Andy Lutomirski wrote:
> This switches virtio to use the DMA API unconditionally. I'm sure
> it breaks things, but it seems to work on x86 using virtio-pci, with
> and without Xen, and using both the modern 1.0 variant and the
> legacy variant.

So thinking hard about it, I don't see any real drawbacks to making this
conditional on a new feature bit, that Xen can then set.

As a bonus, host can distinguish between old and new guests using the
feature bit, even though making driver *control* whether IOMMU is
bypassed makes userspace drivers unsafe, so might not be a good idea.

A tiny bit more code but not by much, and we clearly won't
be breaking anything that's not already broken,
and we will be able to drop the extra code later
if we think it's a good idea.

I'll run this by the virtio TC on OASIS next week so we
can reserve a feature bit.

> Changes from v2:
> - Fix really embarrassing bug. This version actually works.
>
> Changes from v1:
> - Fix an endian conversion error causing a BUG to hit.
> - Fix a DMA ordering issue (swiotlb=force works now).
> - Minor cleanups.
>
> Andy Lutomirski (3):
> virtio_net: Stop doing DMA from the stack
> virtio_ring: Support DMA APIs
> virtio_pci: Use the DMA API
>
> drivers/net/virtio_net.c | 53 +++++++----
> drivers/virtio/Kconfig | 2 +-
> drivers/virtio/virtio_pci_common.h | 3 +-
> drivers/virtio/virtio_pci_legacy.c | 19 +++-
> drivers/virtio/virtio_pci_modern.c | 34 +++++--
> drivers/virtio/virtio_ring.c | 187 ++++++++++++++++++++++++++++++-------
> tools/virtio/linux/dma-mapping.h | 17 ++++
> 7 files changed, 246 insertions(+), 69 deletions(-)
> create mode 100644 tools/virtio/linux/dma-mapping.h
>
> --
> 2.4.3


2015-11-19 21:59:28

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v3 0/3] virtio DMA API core stuff

On Nov 19, 2015 5:45 AM, "Michael S. Tsirkin" <[email protected]> wrote:
>
> On Tue, Oct 27, 2015 at 11:38:57PM -0700, Andy Lutomirski wrote:
> > This switches virtio to use the DMA API unconditionally. I'm sure
> > it breaks things, but it seems to work on x86 using virtio-pci, with
> > and without Xen, and using both the modern 1.0 variant and the
> > legacy variant.
>
> So thinking hard about it, I don't see any real drawbacks to making this
> conditional on a new feature bit, that Xen can then set..

Can you elaborate? If I run QEMU, hosting Xen, hosting Linux, and the
virtio device is provided by QEMU, then how does Xen set the bit?
Similarly, how would Xen set the bit for a real physical device?


--Andy

2015-11-19 23:38:18

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v3 0/3] virtio DMA API core stuff

On Thu, 2015-11-19 at 13:59 -0800, Andy Lutomirski wrote:
>
> >
> > So thinking hard about it, I don't see any real drawbacks to making this
> > conditional on a new feature bit, that Xen can then set..
>
> Can you elaborate?  If I run QEMU, hosting Xen, hosting Linux, and the
> virtio device is provided by QEMU, then how does Xen set the bit?
> Similarly, how would Xen set the bit for a real physical device?

Right. This is *not* a fundamental characteristic of the device. This
is all about how your *particular* hypervisor (in the set of turtles-
all-the-way-down) happened to expose the thing to you.

This is why it lives in the DMAR table, in the Intel world, which
*tells* you which devices are behind which IOMMU (and which are not).
And why I keep repeating myself that it has nothing to do with the
actual device or the virtio drivers.

I understand that POWER and other platforms don't currently have a
clean way to indicate that certain device don't have translation. And I
understand that we may end up with a *quirk* which ensures that the DMA
API does the right thing (i.e. nothing) in certain cases.

But we should *NOT* be involving the virtio device drivers in that
quirk, in any way. And putting a feature bit in the virtio device
itself doesn't seem at all sane either.

Bear in mind that qemu-system-x86_64 currently has the *same* problem
with assigned physical devices. It's claiming they're translated, and
they're not.

--
dwmw2


Attachments:
smime.p7s (5.56 kB)

2015-11-20 02:57:47

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH v3 0/3] virtio DMA API core stuff

On Thu, 2015-11-19 at 23:38 +0000, David Woodhouse wrote:
>
> I understand that POWER and other platforms don't currently have a
> clean way to indicate that certain device don't have translation. And I
> understand that we may end up with a *quirk* which ensures that the DMA
> API does the right thing (i.e. nothing) in certain cases.
>
> But we should *NOT* be involving the virtio device drivers in that
> quirk, in any way. And putting a feature bit in the virtio device
> itself doesn't seem at all sane either.
>
> Bear in mind that qemu-system-x86_64 currently has the *same* problem
> with assigned physical devices. It's claiming they're translated, and
> they're not.

It's not that clear but yeah ... as I mentioned, I can't find a
way to do that quirk that won't break when we want to actually use
the iommu... 

Ben.

2015-11-20 06:56:56

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v3 0/3] virtio DMA API core stuff

On Thu, Nov 19, 2015 at 01:59:05PM -0800, Andy Lutomirski wrote:
> On Nov 19, 2015 5:45 AM, "Michael S. Tsirkin" <[email protected]> wrote:
> >
> > On Tue, Oct 27, 2015 at 11:38:57PM -0700, Andy Lutomirski wrote:
> > > This switches virtio to use the DMA API unconditionally. I'm sure
> > > it breaks things, but it seems to work on x86 using virtio-pci, with
> > > and without Xen, and using both the modern 1.0 variant and the
> > > legacy variant.
> >
> > So thinking hard about it, I don't see any real drawbacks to making this
> > conditional on a new feature bit, that Xen can then set..
>
> Can you elaborate? If I run QEMU, hosting Xen, hosting Linux, and the
> virtio device is provided by QEMU, then how does Xen set the bit?

You would run QEMU with the appropriate flag. E.g.
-global virtio-pci,use_platform_dma=on

> Similarly, how would Xen set the bit for a real physical device?
>
>
> --Andy

There's no need to set bits for physical devices I think: from security
point of view, using them from a VM isn't very different from using them
from host.



--
MST

2015-11-20 07:47:37

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v3 0/3] virtio DMA API core stuff

On Fri, Nov 20, 2015 at 08:56:46AM +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 19, 2015 at 01:59:05PM -0800, Andy Lutomirski wrote:
> > On Nov 19, 2015 5:45 AM, "Michael S. Tsirkin" <[email protected]> wrote:
> > >
> > > On Tue, Oct 27, 2015 at 11:38:57PM -0700, Andy Lutomirski wrote:
> > > > This switches virtio to use the DMA API unconditionally. I'm sure
> > > > it breaks things, but it seems to work on x86 using virtio-pci, with
> > > > and without Xen, and using both the modern 1.0 variant and the
> > > > legacy variant.
> > >
> > > So thinking hard about it, I don't see any real drawbacks to making this
> > > conditional on a new feature bit, that Xen can then set..
> >
> > Can you elaborate? If I run QEMU, hosting Xen, hosting Linux, and the
> > virtio device is provided by QEMU, then how does Xen set the bit?
>
> You would run QEMU with the appropriate flag. E.g.
> -global virtio-pci,use_platform_dma=on

Or Xen code within QEMU can tweak this global internally
so users don't need to care.

> > Similarly, how would Xen set the bit for a real physical device?
> >
> >
> > --Andy
>
> There's no need to set bits for physical devices I think: from security
> point of view, using them from a VM isn't very different from using them
> from host.
>
>
>
> --
> MST

2015-11-20 08:21:16

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v3 0/3] virtio DMA API core stuff

On Thu, Nov 19, 2015 at 11:38:06PM +0000, David Woodhouse wrote:
> On Thu, 2015-11-19 at 13:59 -0800, Andy Lutomirski wrote:
> >
> > >
> > > So thinking hard about it, I don't see any real drawbacks to making this
> > > conditional on a new feature bit, that Xen can then set..
> >
> > Can you elaborate?? If I run QEMU, hosting Xen, hosting Linux, and the
> > virtio device is provided by QEMU, then how does Xen set the bit?
> > Similarly, how would Xen set the bit for a real physical device?
>
> Right. This is *not* a fundamental characteristic of the device. This
> is all about how your *particular* hypervisor (in the set of turtles-
> all-the-way-down) happened to expose the thing to you.
>
> This is why it lives in the DMAR table, in the Intel world, which
> *tells* you which devices are behind which IOMMU (and which are not).

David, there are two things a hypervisor needs to tell the guest.
1. The actual device is behind an IOMMU. This is what you
are suggesting we use DMAR for.
2. Using IOMMU from kernel (as opposed to from userspace with VFIO)
actually adds security. For exising virtio devices on KVM,
the answer is no. And DMAR has no way to reflect that.

Question 2 only makes sense if you answer yes to question 1 and if user
wants protection from malicious devices with iommu=on, and
if you care about getting good performance from *other*
devices. And what guest would do is use 1:1 for the
devices where answer 2 is "no".

Maybe for now I should just give up and say "don't use iommu=on within
VMs if you want any performance". But the point is, if we just fix QEMU
to actually obey IOMMU mappings for assigned devices, then there's
already a kind of answer with virtio being trusted since it's part of
hypervisor, all this without guest changes. Seems kind of sad to let
performance regress.

So a (yet another) feature bit would be a possible solution there, but
we don't seem to be able to even agree on using a feature bit for a
quirk.


> And why I keep repeating myself that it has nothing to do with the
> actual device or the virtio drivers.
>
> I understand that POWER and other platforms don't currently have a
> clean way to indicate that certain device don't have translation. And I
> understand that we may end up with a *quirk* which ensures that the DMA
> API does the right thing (i.e. nothing) in certain cases.

So assuming we forget about 2 above for now, then yes, all we need
is a quirk, using some logic to detect these systems.

> But we should *NOT* be involving the virtio device drivers in that
> quirk, in any way. And putting a feature bit in the virtio device
> itself doesn't seem at all sane either.

Only if there's some other device that benefits from all this work. If
virtio is the only one that benefits, then why do we want to
spread the quirk rules around so much? A feature bit gives us
a single, portable rule that the quirk can use on all platforms.

> Bear in mind that qemu-system-x86_64 currently has the *same* problem
> with assigned physical devices. It's claiming they're translated, and
> they're not.
>
> --
> dwmw2
>

Presumably people either don't assign
devices or don't have an iommu otherwise things won't work for them,
but if they do have an iommu and don't assign devices, then Andy's
patch will break them.

This is not QEMU specific unfortunately, we don't know who
might have implemented virtio.





--
MST

2015-11-20 08:34:30

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v3 0/3] virtio DMA API core stuff

On Fri, Nov 20, 2015 at 01:56:39PM +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2015-11-19 at 23:38 +0000, David Woodhouse wrote:
> >
> > I understand that POWER and other platforms don't currently have a
> > clean way to indicate that certain device don't have translation. And I
> > understand that we may end up with a *quirk* which ensures that the DMA
> > API does the right thing (i.e. nothing) in certain cases.
> >
> > But we should *NOT* be involving the virtio device drivers in that
> > quirk, in any way. And putting a feature bit in the virtio device
> > itself doesn't seem at all sane either.
> >
> > Bear in mind that qemu-system-x86_64 currently has the *same* problem
> > with assigned physical devices. It's claiming they're translated, and
> > they're not.
>
> It's not that clear but yeah ... as I mentioned, I can't find a
> way to do that quirk that won't break when we want to actually use
> the iommu...?
>
> Ben.

Yes, I am not at all sure we need a quirk for assigned devices.
Better teach QEMU to make iommu work for them.


--
MST

2015-11-22 15:58:37

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v3 0/3] virtio DMA API core stuff

On Fri, 2015-11-20 at 10:21 +0200, Michael S. Tsirkin wrote:
>
> David, there are two things a hypervisor needs to tell the guest.
> 1. The actual device is behind an IOMMU. This is what you
>    are suggesting we use DMAR for.
> 2. Using IOMMU from kernel (as opposed to from userspace with VFIO)
>    actually adds security. For exising virtio devices on KVM,
>    the answer is no. And DMAR has no way to reflect that.

Using the IOMMU from the kernel *always* adds security. It protects
against device driver (and device) bugs which can be made exploitable
by allowing DMA to anywhere in the system.

Sure, there are classes of that which are far more interesting, for
example where you give the whole device to a guest and let it load the
firmware. But "we trust the hypervisor" and "we trust the hardware" are
not *so* far apart conceptually.

Hell, with ATS you *still* have to trust the hardware to a large
extent.

I really think that something like the proposed DMA_ATTR_IOMMU_BYPASS
should suffice for the "who cares about security; we want performance"
case.

-- 
dwmw2


Attachments:
smime.p7s (5.56 kB)

2015-11-22 21:52:39

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v3 0/3] virtio DMA API core stuff

On Sun, Nov 22, 2015 at 03:58:28PM +0000, David Woodhouse wrote:
> On Fri, 2015-11-20 at 10:21 +0200, Michael S. Tsirkin wrote:
> >
> > David, there are two things a hypervisor needs to tell the guest.
> > 1. The actual device is behind an IOMMU. This is what you
> > ?? are suggesting we use DMAR for.
> > 2. Using IOMMU from kernel (as opposed to from userspace with VFIO)
> > ?? actually adds security. For exising virtio devices on KVM,
> > ?? the answer is no. And DMAR has no way to reflect that.
>
> Using the IOMMU from the kernel *always* adds security. It protects
> against device driver (and device) bugs which can be made exploitable
> by allowing DMA to anywhere in the system.

No - speaking about QEMU/KVM here - you are not "allowing" DMA - by
programming the virtual IOMMU you are asking the hypervisor nicely to do
that. If it's buggy, it can ignore you and there's nothing you can do.

As with any random change in the system, some bugs might get masked and
become non-exploitable, but then some other bugs might surface and
become exploitable.

I gather that e.g. Xen is different.


> Sure, there are classes of that which are far more interesting, for
> example where you give the whole device to a guest and let it load the
> firmware. But "we trust the hypervisor" and "we trust the hardware" are
> not *so* far apart conceptually.

Depends on the hypervisor I guess. At least for QEMU/KVM, one conceptual
difference is that we actually could have the hypervisor tell us whether
a specific device has to be trusted, or can be protected against, and
user can actually read the code and verify that QEMU is doing the right
thing.

Hardware is closed source so harder to trust.

> Hell, with ATS you *still* have to trust the hardware to a large
> extent.
>
> I really think that something like the proposed DMA_ATTR_IOMMU_BYPASS
> should suffice

I'm not sure how that is supposed to be used - does
the driver request DMA_ATTR_IOMMU_BYPASS at setup time?

If yes then I think that will work for virtio -
we can just set that in the driver.

> for the "who cares about security; we want performance"
> case.
>
> --?
> dwmw2
>

There's that, and there's an "I care about security, but
do not want to burn up cycles on fake protections that
do not work" case.


--
MST

2015-11-22 23:04:15

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v3 0/3] virtio DMA API core stuff



> There's that, and there's an "I care about security, but
> do not want to burn up cycles on fake protections that
> do not work" case.

It would seem to make most sense for this use case simply *not* to expose
virtio devices to guests as being behind an IOMMU at all. Sure, there are
esoteric use cases where the guest actually nests and runs further guests
inside itself and wants to pass through the virtio devices from the real
hardware host. But presumably those configurations will have multiple
virtio devices assigned by the host anyway, and further tweaking the
configuration to put them behind an IOMMU shouldn't be hard.


--
dwmw2

2015-11-22 23:04:10

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v3 0/3] virtio DMA API core stuff



> There's that, and there's an "I care about security, but
> do not want to burn up cycles on fake protections that
> do not work" case.

It would seem to make most sense for this use case simply *not* to expose
virtio devices to guests as being behind an IOMMU at all. Sure, there are
esoteric use cases where the guest actually nests and runs further guests
inside itself and wants to pass through the virtio devices from the real
hardware host. But presumably those configurations will have multiple
virtio devices assigned by the host anyway, and further tweaking the
configuration to put them behind an IOMMU shouldn't be hard.


--
dwmw2

2015-11-23 07:57:07

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v3 0/3] virtio DMA API core stuff

On Sun, Nov 22, 2015 at 10:21:34PM -0000, David Woodhouse wrote:
>
>
> > There's that, and there's an "I care about security, but
> > do not want to burn up cycles on fake protections that
> > do not work" case.
>
> It would seem to make most sense for this use case simply *not* to expose
> virtio devices to guests as being behind an IOMMU at all. Sure, there are
> esoteric use cases where the guest actually nests and runs further guests
> inside itself and wants to pass through the virtio devices from the real
> hardware host. But presumably those configurations will have multiple
> virtio devices assigned by the host anyway, and further tweaking the
> configuration to put them behind an IOMMU shouldn't be hard.

Unfortunately it's a no-go: this breaks the much less esoteric usecase
of DPDK: using virtio devices with userspace drivers.

Well - not breaks as such as this doesn't currently work,
but this approach would prevent us from making it work.

>
> --
> dwmw2