LinuxLists.cc - [PATCH 1/2] mm: move force_dma_unencrypted() to mem

2020-02-20 16:08:52

Subject: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

Currently force_dma_unencrypted() is only used by the direct
implementation of the DMA API, and thus resides in dma-direct.h. But
there is nothing dma-direct specific about it: if one was -- for
whatever reason -- to implement custom DMA ops that have to in the
encrypted/protected scenarios dma-direct currently deals with, one would
need exactly this kind of information.

More importantly, virtio has to use DMA API (so that the memory
encryption (x86) or protection (power, s390) is handled) under the very
same circumstances force_dma_unencrypted() returns true. Furthermore,
the in these cases the reason to go via the DMA API is distinct,
compared to the reason indicated by VIRTIO_F_IOMMU_PLATFORM: we need to
use DMA API independently of the device's properties with regards to
access to memory. I.e. the addresses in the descriptors are still guest
physical addresses, the device may still be implemented by a SMP CPU,
and thus the device shall use those without any further translation. See
[1].

Let's move force_dma_unencrypted() the so virtio, or other
implementations of DMA ops can make the right decisions.

[1] https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-4100006
(In the spec VIRTIO_F_IOMMU_PLATFORM is called
VIRTIO_F_ACCESS_PLATFORM).

Signed-off-by: Halil Pasic <[email protected]>
Tested-by: Ram Pai <[email protected]>
Tested-by: Michael Mueller <[email protected]>
---
include/linux/dma-direct.h | 9 ---------
include/linux/mem_encrypt.h | 10 ++++++++++
2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 24b8684aa21d..590b15d881b0 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -26,15 +26,6 @@ static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dev_addr)
}
#endif /* !CONFIG_ARCH_HAS_PHYS_TO_DMA */

-#ifdef CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED
-bool force_dma_unencrypted(struct device *dev);
-#else
-static inline bool force_dma_unencrypted(struct device *dev)
-{
- return false;
-}
-#endif /* CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED */
-
/*
* If memory encryption is supported, phys_to_dma will set the memory encryption
* bit in the DMA address, and dma_to_phys will clear it. The raw __phys_to_dma
diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
index 5c4a18a91f89..64a48c4b01a2 100644
--- a/include/linux/mem_encrypt.h
+++ b/include/linux/mem_encrypt.h
@@ -22,6 +22,16 @@ static inline bool mem_encrypt_active(void) { return false; }

#endif /* CONFIG_ARCH_HAS_MEM_ENCRYPT */

+struct device;
+#ifdef CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED
+bool force_dma_unencrypted(struct device *dev);
+#else
+static inline bool force_dma_unencrypted(struct device *dev)
+{
+ return false;
+}
+#endif /* CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED */
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
/*
* The __sme_set() and __sme_clr() macros are useful for adding or removing
--
2.17.1

2020-02-20 16:12:45

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

On Thu, Feb 20, 2020 at 05:06:05PM +0100, Halil Pasic wrote:
> Currently force_dma_unencrypted() is only used by the direct
> implementation of the DMA API, and thus resides in dma-direct.h. But
> there is nothing dma-direct specific about it: if one was -- for
> whatever reason -- to implement custom DMA ops that have to in the
> encrypted/protected scenarios dma-direct currently deals with, one would
> need exactly this kind of information.

I really don't think it has business being anywhre else, and your completely
bogus second patch just proves the point.

2020-02-20 16:25:03

by Christian Borntraeger

[permalink] [raw]

Subject: Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

On 20.02.20 17:11, Christoph Hellwig wrote:
> On Thu, Feb 20, 2020 at 05:06:05PM +0100, Halil Pasic wrote:
>> Currently force_dma_unencrypted() is only used by the direct
>> implementation of the DMA API, and thus resides in dma-direct.h. But
>> there is nothing dma-direct specific about it: if one was -- for
>> whatever reason -- to implement custom DMA ops that have to in the
>> encrypted/protected scenarios dma-direct currently deals with, one would
>> need exactly this kind of information.
>
> I really don't think it has business being anywhre else, and your completely
> bogus second patch just proves the point.

From a users perspective it makes absolutely perfect sense to use the
bounce buffers when they are NEEDED.
Forcing the user to specify iommu_platform just because you need bounce buffers
really feels wrong. And obviously we have a severe performance issue
because of the indirections.

Now: I understand that you want to get this fixes differently, but maybe you
could help to outline how this could be fixed proper.

Christian

2020-02-20 16:33:30

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger wrote:
> >From a users perspective it makes absolutely perfect sense to use the
> bounce buffers when they are NEEDED.
> Forcing the user to specify iommu_platform just because you need bounce buffers
> really feels wrong. And obviously we have a severe performance issue
> because of the indirections.

The point is that the user should not have to specify iommu_platform.
We need to make sure any new hypervisor (especially one that might require
bounce buffering) always sets it, as was a rather bogus legacy hack
that isn't extensibe for cases that for example require bounce buffering.

2020-02-20 17:01:49

by Christian Borntraeger

[permalink] [raw]

Subject: Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

On 20.02.20 17:31, Christoph Hellwig wrote:
> On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger wrote:
>> >From a users perspective it makes absolutely perfect sense to use the
>> bounce buffers when they are NEEDED.
>> Forcing the user to specify iommu_platform just because you need bounce buffers
>> really feels wrong. And obviously we have a severe performance issue
>> because of the indirections.
>
> The point is that the user should not have to specify iommu_platform.

I (and Halil) agree on that. From a user perspective we want to
have the right thing in the guest without any command option. The iommu_plaform
thing was indeed a first step to make things work.

I might mis-read Halils patch, but isnt one effect of this patch set
that things WILL work without iommu_platform?

> We need to make sure any new hypervisor (especially one that might require
> bounce buffering) always sets it, as was a rather bogus legacy hack
> that isn't extensibe for cases that for example require bounce buffering.

2020-02-21 03:28:12

by David Gibson

[permalink] [raw]

Subject: Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

On Thu, Feb 20, 2020 at 05:31:35PM +0100, Christoph Hellwig wrote:
> On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger wrote:
> > >From a users perspective it makes absolutely perfect sense to use the
> > bounce buffers when they are NEEDED.
> > Forcing the user to specify iommu_platform just because you need bounce buffers
> > really feels wrong. And obviously we have a severe performance issue
> > because of the indirections.
>
> The point is that the user should not have to specify iommu_platform.
> We need to make sure any new hypervisor (especially one that might require
> bounce buffering) always sets it,

So, I have draft qemu patches which enable iommu_platform by default.
But that's really because of other problems with !iommu_platform, not
anything to do with bounce buffering or secure VMs.

The thing is that the hypervisor *doesn't* require bounce buffering.
In the POWER (and maybe s390 as well) models for Secure VMs, it's the
*guest*'s choice to enter secure mode, so the hypervisor has no reason
to know whether the guest needs bounce buffering. As far as the
hypervisor and qemu are concerned that's a guest internal detail, it
just expects to get addresses it can access whether those are GPAs
(iommu_platform=off) or IOVAs (iommu_platform=on).

> as was a rather bogus legacy hack

It was certainly a bad idea, but it was a bad idea that went into a
public spec and has been widely deployed for many years. We can't
just pretend it didn't happen and move on.

Turning iommu_platform=on by default breaks old guests, some of which
we still care about. We can't (automatically) do it only for guests
that need bounce buffering, because the hypervisor doesn't know that
ahead of time.

> that isn't extensibe for cases that for example require bounce buffering.

In fact bounce buffering isn't really the issue from the hypervisor
(or spec's) point of view. It's the fact that not all of guest memory
is accessible to the hypervisor. Bounce buffering is just one way the
guest might deal with that.

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachments:

(No filename) (2.23 kB)
signature.asc (849.00 B)
Download all attachments

2020-02-21 13:07:38

by Halil Pasic

[permalink] [raw]

Subject: Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

On Fri, 21 Feb 2020 14:27:27 +1100
David Gibson <[email protected]> wrote:

> On Thu, Feb 20, 2020 at 05:31:35PM +0100, Christoph Hellwig wrote:
> > On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger wrote:
> > > >From a users perspective it makes absolutely perfect sense to use the
> > > bounce buffers when they are NEEDED.
> > > Forcing the user to specify iommu_platform just because you need bounce buffers
> > > really feels wrong. And obviously we have a severe performance issue
> > > because of the indirections.
> >
> > The point is that the user should not have to specify iommu_platform.
> > We need to make sure any new hypervisor (especially one that might require
> > bounce buffering) always sets it,
>
> So, I have draft qemu patches which enable iommu_platform by default.
> But that's really because of other problems with !iommu_platform, not
> anything to do with bounce buffering or secure VMs.
>
> The thing is that the hypervisor *doesn't* require bounce buffering.
> In the POWER (and maybe s390 as well) models for Secure VMs, it's the
> *guest*'s choice to enter secure mode, so the hypervisor has no reason
> to know whether the guest needs bounce buffering. As far as the
> hypervisor and qemu are concerned that's a guest internal detail, it
> just expects to get addresses it can access whether those are GPAs
> (iommu_platform=off) or IOVAs (iommu_platform=on).

I very much agree!

>
> > as was a rather bogus legacy hack
>
> It was certainly a bad idea, but it was a bad idea that went into a
> public spec and has been widely deployed for many years. We can't
> just pretend it didn't happen and move on.
>
> Turning iommu_platform=on by default breaks old guests, some of which
> we still care about. We can't (automatically) do it only for guests
> that need bounce buffering, because the hypervisor doesn't know that
> ahead of time.

Turning iommu_platform=on for virtio-ccw makes no sense whatsover,
because for CCW I/O there is no such thing as IOMMU and the addresses
are always physical addresses.

>
> > that isn't extensibe for cases that for example require bounce buffering.
>
> In fact bounce buffering isn't really the issue from the hypervisor
> (or spec's) point of view. It's the fact that not all of guest memory
> is accessible to the hypervisor. Bounce buffering is just one way the
> guest might deal with that.
>

Agreed.

Regards,
Halil

Attachments:

(No filename) (853.00 B)
OpenPGP digital signature

2020-02-21 15:48:57

by Michael S. Tsirkin

[permalink] [raw]

Subject: Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

On Fri, Feb 21, 2020 at 02:06:39PM +0100, Halil Pasic wrote:
> On Fri, 21 Feb 2020 14:27:27 +1100
> David Gibson <[email protected]> wrote:
>
> > On Thu, Feb 20, 2020 at 05:31:35PM +0100, Christoph Hellwig wrote:
> > > On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger wrote:
> > > > >From a users perspective it makes absolutely perfect sense to use the
> > > > bounce buffers when they are NEEDED.
> > > > Forcing the user to specify iommu_platform just because you need bounce buffers
> > > > really feels wrong. And obviously we have a severe performance issue
> > > > because of the indirections.
> > >
> > > The point is that the user should not have to specify iommu_platform.
> > > We need to make sure any new hypervisor (especially one that might require
> > > bounce buffering) always sets it,
> >
> > So, I have draft qemu patches which enable iommu_platform by default.
> > But that's really because of other problems with !iommu_platform, not
> > anything to do with bounce buffering or secure VMs.
> >
> > The thing is that the hypervisor *doesn't* require bounce buffering.
> > In the POWER (and maybe s390 as well) models for Secure VMs, it's the
> > *guest*'s choice to enter secure mode, so the hypervisor has no reason
> > to know whether the guest needs bounce buffering. As far as the
> > hypervisor and qemu are concerned that's a guest internal detail, it
> > just expects to get addresses it can access whether those are GPAs
> > (iommu_platform=off) or IOVAs (iommu_platform=on).
>
> I very much agree!
>
> >
> > > as was a rather bogus legacy hack
> >
> > It was certainly a bad idea, but it was a bad idea that went into a
> > public spec and has been widely deployed for many years. We can't
> > just pretend it didn't happen and move on.
> >
> > Turning iommu_platform=on by default breaks old guests, some of which
> > we still care about. We can't (automatically) do it only for guests
> > that need bounce buffering, because the hypervisor doesn't know that
> > ahead of time.
>
> Turning iommu_platform=on for virtio-ccw makes no sense whatsover,
> because for CCW I/O there is no such thing as IOMMU and the addresses
> are always physical addresses.

Fix the name then. The spec calls is ACCESS_PLATFORM now, which
makes much more sense.

> >
> > > that isn't extensibe for cases that for example require bounce buffering.
> >
> > In fact bounce buffering isn't really the issue from the hypervisor
> > (or spec's) point of view. It's the fact that not all of guest memory
> > is accessible to the hypervisor. Bounce buffering is just one way the
> > guest might deal with that.
> >
>
> Agreed.
>
> Regards,
> Halil
>
>
>

2020-02-21 18:07:32

by Halil Pasic

[permalink] [raw]

Subject: Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

On Fri, 21 Feb 2020 10:48:15 -0500
"Michael S. Tsirkin" <[email protected]> wrote:

> On Fri, Feb 21, 2020 at 02:06:39PM +0100, Halil Pasic wrote:
> > On Fri, 21 Feb 2020 14:27:27 +1100
> > David Gibson <[email protected]> wrote:
> >
> > > On Thu, Feb 20, 2020 at 05:31:35PM +0100, Christoph Hellwig wrote:
> > > > On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger wrote:
> > > > > >From a users perspective it makes absolutely perfect sense to use the
> > > > > bounce buffers when they are NEEDED.
> > > > > Forcing the user to specify iommu_platform just because you need bounce buffers
> > > > > really feels wrong. And obviously we have a severe performance issue
> > > > > because of the indirections.
> > > >
> > > > The point is that the user should not have to specify iommu_platform.
> > > > We need to make sure any new hypervisor (especially one that might require
> > > > bounce buffering) always sets it,
> > >
> > > So, I have draft qemu patches which enable iommu_platform by default.
> > > But that's really because of other problems with !iommu_platform, not
> > > anything to do with bounce buffering or secure VMs.
> > >
> > > The thing is that the hypervisor *doesn't* require bounce buffering.
> > > In the POWER (and maybe s390 as well) models for Secure VMs, it's the
> > > *guest*'s choice to enter secure mode, so the hypervisor has no reason
> > > to know whether the guest needs bounce buffering. As far as the
> > > hypervisor and qemu are concerned that's a guest internal detail, it
> > > just expects to get addresses it can access whether those are GPAs
> > > (iommu_platform=off) or IOVAs (iommu_platform=on).
> >
> > I very much agree!
> >
> > >
> > > > as was a rather bogus legacy hack
> > >
> > > It was certainly a bad idea, but it was a bad idea that went into a
> > > public spec and has been widely deployed for many years. We can't
> > > just pretend it didn't happen and move on.
> > >
> > > Turning iommu_platform=on by default breaks old guests, some of which
> > > we still care about. We can't (automatically) do it only for guests
> > > that need bounce buffering, because the hypervisor doesn't know that
> > > ahead of time.
> >
> > Turning iommu_platform=on for virtio-ccw makes no sense whatsover,
> > because for CCW I/O there is no such thing as IOMMU and the addresses
> > are always physical addresses.
>
> Fix the name then. The spec calls is ACCESS_PLATFORM now, which
> makes much more sense.

I don't quite get it. Sorry. Maybe I will revisit this later.

Regards,
Halil

>
> > >
> > > > that isn't extensibe for cases that for example require bounce buffering.
> > >
> > > In fact bounce buffering isn't really the issue from the hypervisor
> > > (or spec's) point of view. It's the fact that not all of guest memory
> > > is accessible to the hypervisor. Bounce buffering is just one way the
> > > guest might deal with that.
> > >
> >
> > Agreed.
> >
> > Regards,
> > Halil
> >
> >
> >
>
>

2020-02-24 06:45:13

by David Gibson

[permalink] [raw]

Subject: Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

On Fri, Feb 21, 2020 at 07:07:02PM +0100, Halil Pasic wrote:
> On Fri, 21 Feb 2020 10:48:15 -0500
> "Michael S. Tsirkin" <[email protected]> wrote:
>
> > On Fri, Feb 21, 2020 at 02:06:39PM +0100, Halil Pasic wrote:
> > > On Fri, 21 Feb 2020 14:27:27 +1100
> > > David Gibson <[email protected]> wrote:
> > >
> > > > On Thu, Feb 20, 2020 at 05:31:35PM +0100, Christoph Hellwig wrote:
> > > > > On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger wrote:
> > > > > > >From a users perspective it makes absolutely perfect sense to use the
> > > > > > bounce buffers when they are NEEDED.
> > > > > > Forcing the user to specify iommu_platform just because you need bounce buffers
> > > > > > really feels wrong. And obviously we have a severe performance issue
> > > > > > because of the indirections.
> > > > >
> > > > > The point is that the user should not have to specify iommu_platform.
> > > > > We need to make sure any new hypervisor (especially one that might require
> > > > > bounce buffering) always sets it,
> > > >
> > > > So, I have draft qemu patches which enable iommu_platform by default.
> > > > But that's really because of other problems with !iommu_platform, not
> > > > anything to do with bounce buffering or secure VMs.
> > > >
> > > > The thing is that the hypervisor *doesn't* require bounce buffering.
> > > > In the POWER (and maybe s390 as well) models for Secure VMs, it's the
> > > > *guest*'s choice to enter secure mode, so the hypervisor has no reason
> > > > to know whether the guest needs bounce buffering. As far as the
> > > > hypervisor and qemu are concerned that's a guest internal detail, it
> > > > just expects to get addresses it can access whether those are GPAs
> > > > (iommu_platform=off) or IOVAs (iommu_platform=on).
> > >
> > > I very much agree!
> > >
> > > >
> > > > > as was a rather bogus legacy hack
> > > >
> > > > It was certainly a bad idea, but it was a bad idea that went into a
> > > > public spec and has been widely deployed for many years. We can't
> > > > just pretend it didn't happen and move on.
> > > >
> > > > Turning iommu_platform=on by default breaks old guests, some of which
> > > > we still care about. We can't (automatically) do it only for guests
> > > > that need bounce buffering, because the hypervisor doesn't know that
> > > > ahead of time.
> > >
> > > Turning iommu_platform=on for virtio-ccw makes no sense whatsover,
> > > because for CCW I/O there is no such thing as IOMMU and the addresses
> > > are always physical addresses.
> >
> > Fix the name then. The spec calls is ACCESS_PLATFORM now, which
> > makes much more sense.
>
> I don't quite get it. Sorry. Maybe I will revisit this later.

Halil, I think I can clarify this.

The "iommu_platform" flag doesn't necessarily have anything to do with
an iommu, although it often will. Basically it means "access guest
memory via the bus's normal DMA mechanism" rather than "access guest
memory using GPA, because you're the hypervisor and you can do that".

For the case of ccw, both mechanisms end up being the same thing,
since CCW's normal DMA *is* untranslated GPA access.

For this reason, the flag in the spec was renamed to ACCESS_PLATFORM,
but the flag in qemu still has the old name.

AIUI, Michael is saying you could trivially change the name in qemu
(obviously you'd need to alias the old name to the new one for
compatibility).

Actually, the fact that ccw has no translation makes things easier for
you: you don't really have any impediment to turning ACCESS_PLATFORM
on by default, since it doesn't make any real change to how things
work.

The remaining difficulty is that the virtio driver - since it can sit
on multiple buses - won't know this, and will reject the
ACCESS_PLATFORM flag, even though it could just do what it normally
does on ccw and it would work.

For that case, we could consider a hack in qemu where for virtio-ccw
devices *only* we allow the guest to nack the ACCESS_PLATFORM flag and
carry on anyway. Normally we insist that the guest accept the
ACCESS_PLATFORM flag if offered, because on most platforms they
*don't* amount to the same thing.

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachments:

(No filename) (4.34 kB)
signature.asc (849.00 B)
Download all attachments

2020-02-24 18:51:33

by Halil Pasic

[permalink] [raw]

Subject: Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

On Mon, 24 Feb 2020 14:33:14 +1100
David Gibson <[email protected]> wrote:

> On Fri, Feb 21, 2020 at 07:07:02PM +0100, Halil Pasic wrote:
> > On Fri, 21 Feb 2020 10:48:15 -0500
> > "Michael S. Tsirkin" <[email protected]> wrote:
> >
> > > On Fri, Feb 21, 2020 at 02:06:39PM +0100, Halil Pasic wrote:
> > > > On Fri, 21 Feb 2020 14:27:27 +1100
> > > > David Gibson <[email protected]> wrote:
> > > >
> > > > > On Thu, Feb 20, 2020 at 05:31:35PM +0100, Christoph Hellwig wrote:
> > > > > > On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger wrote:
> > > > > > > >From a users perspective it makes absolutely perfect sense to use the
> > > > > > > bounce buffers when they are NEEDED.
> > > > > > > Forcing the user to specify iommu_platform just because you need bounce buffers
> > > > > > > really feels wrong. And obviously we have a severe performance issue
> > > > > > > because of the indirections.
> > > > > >
> > > > > > The point is that the user should not have to specify iommu_platform.
> > > > > > We need to make sure any new hypervisor (especially one that might require
> > > > > > bounce buffering) always sets it,
> > > > >
> > > > > So, I have draft qemu patches which enable iommu_platform by default.
> > > > > But that's really because of other problems with !iommu_platform, not
> > > > > anything to do with bounce buffering or secure VMs.
> > > > >
> > > > > The thing is that the hypervisor *doesn't* require bounce buffering.
> > > > > In the POWER (and maybe s390 as well) models for Secure VMs, it's the
> > > > > *guest*'s choice to enter secure mode, so the hypervisor has no reason
> > > > > to know whether the guest needs bounce buffering. As far as the
> > > > > hypervisor and qemu are concerned that's a guest internal detail, it
> > > > > just expects to get addresses it can access whether those are GPAs
> > > > > (iommu_platform=off) or IOVAs (iommu_platform=on).
> > > >
> > > > I very much agree!
> > > >
> > > > >
> > > > > > as was a rather bogus legacy hack
> > > > >
> > > > > It was certainly a bad idea, but it was a bad idea that went into a
> > > > > public spec and has been widely deployed for many years. We can't
> > > > > just pretend it didn't happen and move on.
> > > > >
> > > > > Turning iommu_platform=on by default breaks old guests, some of which
> > > > > we still care about. We can't (automatically) do it only for guests
> > > > > that need bounce buffering, because the hypervisor doesn't know that
> > > > > ahead of time.
> > > >
> > > > Turning iommu_platform=on for virtio-ccw makes no sense whatsover,
> > > > because for CCW I/O there is no such thing as IOMMU and the addresses
> > > > are always physical addresses.
> > >
> > > Fix the name then. The spec calls is ACCESS_PLATFORM now, which
> > > makes much more sense.
> >
> > I don't quite get it. Sorry. Maybe I will revisit this later.
>
> Halil, I think I can clarify this.
>
> The "iommu_platform" flag doesn't necessarily have anything to do with
> an iommu, although it often will. Basically it means "access guest
> memory via the bus's normal DMA mechanism" rather than "access guest
> memory using GPA, because you're the hypervisor and you can do that".
>

Unfortunately, I don't think this is what is conveyed to the end users.
Let's see what do we have documented:

Neither Qemu user documentation
(https://www.qemu.org/docs/master/qemu-doc.html) nor online help:
qemu-system-s390x -device virtio-net-ccw,?|grep iommu
iommu_platform=<bool> - on/off (default: false)
has any documentation on it.

But libvirt does have have documenttion on the knob that contros
iommu_platform for QEMU (when QEMU is managed by libvirt):
"""
Virtio-related options

QEMU's virtio devices have some attributes related to the virtio
transport under the driver element: The iommu attribute enables the use
of emulated IOMMU by the device. The attribute ats controls the Address
Translation Service support for PCIe devices. This is needed to make use
of IOTLB support (see IOMMU device). Possible values are on or off.
Since 3.5.0
"""
(https://libvirt.org/formatdomain.html#elementsVirtio)

Thus it seems the only available documentation says that it "enables the use
of emulated IOMMU by the device".

And for vhost-user we have
"""
When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not been negotiated:

* Guest addresses map to the vhost memory region containing that guest
address.

When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated:

* Guest addresses are also called I/O virtual addresses (IOVAs). They are
translated to user addresses via the IOTLB.
"""
(docs/interop/vhost-user.rst)

> For the case of ccw, both mechanisms end up being the same thing,
> since CCW's normal DMA *is* untranslated GPA access.
>

Nod.

> For this reason, the flag in the spec was renamed to ACCESS_PLATFORM,
> but the flag in qemu still has the old name.
>

Yes, the name in the spec is more neutral.

> AIUI, Michael is saying you could trivially change the name in qemu
> (obviously you'd need to alias the old name to the new one for
> compatibility).
>

I could, and the I could also ask the libvirt guys to change <driver
iommu='X'> to <driver access_platform='X'> or similar and to change
their documentation to something that is harder to comprehend. Although
I'm not sure they would like the idea.

>
> Actually, the fact that ccw has no translation makes things easier for
> you: you don't really have any impediment to turning ACCESS_PLATFORM
> on by default, since it doesn't make any real change to how things
> work.

Yeah, it should not, in theory, but currently it does in practice.
Currently vhost will try to do the IOTLB dance (Jason has a patch that
should help with that), and we get the 'use dma api' side effects in the
guest (e.g. virtqueue's data go <2G + some overhead).

>
> The remaining difficulty is that the virtio driver - since it can sit
> on multiple buses - won't know this, and will reject the
> ACCESS_PLATFORM flag, even though it could just do what it normally
> does on ccw and it would work.

Right ACCESS_PLATFORM is a funny feature where the device refuses to
work if the driver does not ack.

>
> For that case, we could consider a hack in qemu where for virtio-ccw
> devices *only* we allow the guest to nack the ACCESS_PLATFORM flag and
> carry on anyway. Normally we insist that the guest accept the
> ACCESS_PLATFORM flag if offered, because on most platforms they
> *don't* amount to the same thing.

Jason found a nice way to differentiate between needs translation and
does not need translation. But that patch still requires the ack by the
driver (and as Michael has pointed out we have to consider migration).

I'm afraid that F_IOMMU_PLATFORM means different things in different
contexts, and that this ain't sufficiently documented. I'm tempted to do
a proper write-up on this (let's hope my motivation will and my time
will allow). I would also very much like to have Conny's opinion on this.

Regards,
Halil

Attachments:

(No filename) (853.00 B)
OpenPGP digital signature

2020-02-25 18:11:17

by Cornelia Huck

[permalink] [raw]

Subject: Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

On Mon, 24 Feb 2020 19:49:53 +0100
Halil Pasic <[email protected]> wrote:

> On Mon, 24 Feb 2020 14:33:14 +1100
> David Gibson <[email protected]> wrote:
>
> > On Fri, Feb 21, 2020 at 07:07:02PM +0100, Halil Pasic wrote:
> > > On Fri, 21 Feb 2020 10:48:15 -0500
> > > "Michael S. Tsirkin" <[email protected]> wrote:
> > >
> > > > On Fri, Feb 21, 2020 at 02:06:39PM +0100, Halil Pasic wrote:
> > > > > On Fri, 21 Feb 2020 14:27:27 +1100
> > > > > David Gibson <[email protected]> wrote:
> > > > >
> > > > > > On Thu, Feb 20, 2020 at 05:31:35PM +0100, Christoph Hellwig wrote:
> > > > > > > On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger wrote:
> > > > > > > > >From a users perspective it makes absolutely perfect sense to use the
> > > > > > > > bounce buffers when they are NEEDED.
> > > > > > > > Forcing the user to specify iommu_platform just because you need bounce buffers
> > > > > > > > really feels wrong. And obviously we have a severe performance issue
> > > > > > > > because of the indirections.
> > > > > > >
> > > > > > > The point is that the user should not have to specify iommu_platform.
> > > > > > > We need to make sure any new hypervisor (especially one that might require
> > > > > > > bounce buffering) always sets it,
> > > > > >
> > > > > > So, I have draft qemu patches which enable iommu_platform by default.
> > > > > > But that's really because of other problems with !iommu_platform, not
> > > > > > anything to do with bounce buffering or secure VMs.
> > > > > >
> > > > > > The thing is that the hypervisor *doesn't* require bounce buffering.
> > > > > > In the POWER (and maybe s390 as well) models for Secure VMs, it's the
> > > > > > *guest*'s choice to enter secure mode, so the hypervisor has no reason
> > > > > > to know whether the guest needs bounce buffering. As far as the
> > > > > > hypervisor and qemu are concerned that's a guest internal detail, it
> > > > > > just expects to get addresses it can access whether those are GPAs
> > > > > > (iommu_platform=off) or IOVAs (iommu_platform=on).
> > > > >
> > > > > I very much agree!
> > > > >
> > > > > >
> > > > > > > as was a rather bogus legacy hack
> > > > > >
> > > > > > It was certainly a bad idea, but it was a bad idea that went into a
> > > > > > public spec and has been widely deployed for many years. We can't
> > > > > > just pretend it didn't happen and move on.
> > > > > >
> > > > > > Turning iommu_platform=on by default breaks old guests, some of which
> > > > > > we still care about. We can't (automatically) do it only for guests
> > > > > > that need bounce buffering, because the hypervisor doesn't know that
> > > > > > ahead of time.

We could default to iommu_platform=on on s390 when the host has active
support for protected virtualization... but that's just another kind of
horrible, so let's just pretend I didn't suggest it.

> > > > >
> > > > > Turning iommu_platform=on for virtio-ccw makes no sense whatsover,
> > > > > because for CCW I/O there is no such thing as IOMMU and the addresses
> > > > > are always physical addresses.
> > > >
> > > > Fix the name then. The spec calls is ACCESS_PLATFORM now, which
> > > > makes much more sense.
> > >
> > > I don't quite get it. Sorry. Maybe I will revisit this later.
> >
> > Halil, I think I can clarify this.
> >
> > The "iommu_platform" flag doesn't necessarily have anything to do with
> > an iommu, although it often will. Basically it means "access guest
> > memory via the bus's normal DMA mechanism" rather than "access guest
> > memory using GPA, because you're the hypervisor and you can do that".
> >
>
> Unfortunately, I don't think this is what is conveyed to the end users.
> Let's see what do we have documented:
>
> Neither Qemu user documentation
> (https://www.qemu.org/docs/master/qemu-doc.html) nor online help:
> qemu-system-s390x -device virtio-net-ccw,?|grep iommu
> iommu_platform=<bool> - on/off (default: false)
> has any documentation on it.

Now, that's 'helpful' -- this certainly calls out for a bit of doc...

>
> But libvirt does have have documenttion on the knob that contros
> iommu_platform for QEMU (when QEMU is managed by libvirt):
> """
> Virtio-related options
>
> QEMU's virtio devices have some attributes related to the virtio
> transport under the driver element: The iommu attribute enables the use
> of emulated IOMMU by the device. The attribute ats controls the Address
> Translation Service support for PCIe devices. This is needed to make use
> of IOTLB support (see IOMMU device). Possible values are on or off.
> Since 3.5.0
> """
> (https://libvirt.org/formatdomain.html#elementsVirtio)
>
> Thus it seems the only available documentation says that it "enables the use
> of emulated IOMMU by the device".
>
> And for vhost-user we have
> """
> When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not been negotiated:
>
> * Guest addresses map to the vhost memory region containing that guest
> address.
>
> When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated:
>
> * Guest addresses are also called I/O virtual addresses (IOVAs). They are
> translated to user addresses via the IOTLB.
> """
> (docs/interop/vhost-user.rst)
>
> > For the case of ccw, both mechanisms end up being the same thing,
> > since CCW's normal DMA *is* untranslated GPA access.
> >
>
> Nod.
>
> > For this reason, the flag in the spec was renamed to ACCESS_PLATFORM,
> > but the flag in qemu still has the old name.
> >
>
> Yes, the name in the spec is more neutral.
>
> > AIUI, Michael is saying you could trivially change the name in qemu
> > (obviously you'd need to alias the old name to the new one for
> > compatibility).
> >
>
> I could, and the I could also ask the libvirt guys to change <driver
> iommu='X'> to <driver access_platform='X'> or similar and to change
> their documentation to something that is harder to comprehend. Although
> I'm not sure they would like the idea.

Hopefully, the documentation can be changed to something that is _not_
harder to comprehend :) (with a bit more text, I suppose.) Renaming to
something like access_platform seems like a good idea, even with the
required compat dance.

>
> >
> > Actually, the fact that ccw has no translation makes things easier for
> > you: you don't really have any impediment to turning ACCESS_PLATFORM
> > on by default, since it doesn't make any real change to how things
> > work.
>
> Yeah, it should not, in theory, but currently it does in practice.
> Currently vhost will try to do the IOTLB dance (Jason has a patch that
> should help with that), and we get the 'use dma api' side effects in the
> guest (e.g. virtqueue's data go <2G + some overhead).

Nod.

>
> >
> > The remaining difficulty is that the virtio driver - since it can sit
> > on multiple buses - won't know this, and will reject the
> > ACCESS_PLATFORM flag, even though it could just do what it normally
> > does on ccw and it would work.
>
> Right ACCESS_PLATFORM is a funny feature where the device refuses to
> work if the driver does not ack.
>
> >
> > For that case, we could consider a hack in qemu where for virtio-ccw
> > devices *only* we allow the guest to nack the ACCESS_PLATFORM flag and
> > carry on anyway. Normally we insist that the guest accept the
> > ACCESS_PLATFORM flag if offered, because on most platforms they
> > *don't* amount to the same thing.
>
> Jason found a nice way to differentiate between needs translation and
> does not need translation. But that patch still requires the ack by the
> driver (and as Michael has pointed out we have to consider migration).
>
> I'm afraid that F_IOMMU_PLATFORM means different things in different
> contexts, and that this ain't sufficiently documented. I'm tempted to do
> a proper write-up on this (let's hope my motivation will and my time
> will allow). I would also very much like to have Conny's opinion on this.

More documentation is never a bad idea; but I'm afraid I don't have any
further insights at the moment.

Attachments:

(No filename) (849.00 B)
OpenPGP digital signature

2020-02-28 01:05:44

by David Gibson

[permalink] [raw]

Subject: Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

On Tue, Feb 25, 2020 at 07:08:02PM +0100, Cornelia Huck wrote:
> On Mon, 24 Feb 2020 19:49:53 +0100
> Halil Pasic <[email protected]> wrote:
>
> > On Mon, 24 Feb 2020 14:33:14 +1100
> > David Gibson <[email protected]> wrote:
> >
> > > On Fri, Feb 21, 2020 at 07:07:02PM +0100, Halil Pasic wrote:
> > > > On Fri, 21 Feb 2020 10:48:15 -0500
> > > > "Michael S. Tsirkin" <[email protected]> wrote:
> > > >
> > > > > On Fri, Feb 21, 2020 at 02:06:39PM +0100, Halil Pasic wrote:
> > > > > > On Fri, 21 Feb 2020 14:27:27 +1100
> > > > > > David Gibson <[email protected]> wrote:
> > > > > >
> > > > > > > On Thu, Feb 20, 2020 at 05:31:35PM +0100, Christoph Hellwig wrote:
> > > > > > > > On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger wrote:
> > > > > > > > > >From a users perspective it makes absolutely perfect sense to use the
> > > > > > > > > bounce buffers when they are NEEDED.
> > > > > > > > > Forcing the user to specify iommu_platform just because you need bounce buffers
> > > > > > > > > really feels wrong. And obviously we have a severe performance issue
> > > > > > > > > because of the indirections.
> > > > > > > >
> > > > > > > > The point is that the user should not have to specify iommu_platform.
> > > > > > > > We need to make sure any new hypervisor (especially one that might require
> > > > > > > > bounce buffering) always sets it,
> > > > > > >
> > > > > > > So, I have draft qemu patches which enable iommu_platform by default.
> > > > > > > But that's really because of other problems with !iommu_platform, not
> > > > > > > anything to do with bounce buffering or secure VMs.
> > > > > > >
> > > > > > > The thing is that the hypervisor *doesn't* require bounce buffering.
> > > > > > > In the POWER (and maybe s390 as well) models for Secure VMs, it's the
> > > > > > > *guest*'s choice to enter secure mode, so the hypervisor has no reason
> > > > > > > to know whether the guest needs bounce buffering. As far as the
> > > > > > > hypervisor and qemu are concerned that's a guest internal detail, it
> > > > > > > just expects to get addresses it can access whether those are GPAs
> > > > > > > (iommu_platform=off) or IOVAs (iommu_platform=on).
> > > > > >
> > > > > > I very much agree!
> > > > > >
> > > > > > >
> > > > > > > > as was a rather bogus legacy hack
> > > > > > >
> > > > > > > It was certainly a bad idea, but it was a bad idea that went into a
> > > > > > > public spec and has been widely deployed for many years. We can't
> > > > > > > just pretend it didn't happen and move on.
> > > > > > >
> > > > > > > Turning iommu_platform=on by default breaks old guests, some of which
> > > > > > > we still care about. We can't (automatically) do it only for guests
> > > > > > > that need bounce buffering, because the hypervisor doesn't know that
> > > > > > > ahead of time.
>
> We could default to iommu_platform=on on s390 when the host has active
> support for protected virtualization... but that's just another kind of
> horrible, so let's just pretend I didn't suggest it.

Yeah, that would break migration between hosts with the feature and
hosts without - for everything, not just protected guests. In general
any kind of guest visible configuration change based on host
properties is incompatible with the qemu/KVM migration model.

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachments:

(No filename) (3.54 kB)
signature.asc (849.00 B)
Download all attachments