2023-09-18 15:16:08

by Niklas Schnelle

[permalink] [raw]
Subject: [PATCH v2 0/2] iommu/virtio: Enable IOMMU_CAP_DERRED_FLUSH

Hi All,

Previously I used virtio-iommu as a non-s390x test vehicle[0] for the
single queue flushing scheme introduced by my s390x DMA API conversion
series[1]. For this I modified virtio-iommu to a) use .iotlb_sync_map
and b) enable IOMMU_CAP_DEFERRED_FLUSH. It turned out that deferred
flush and even just the introduction of ops->iotlb_sync_map yield
performance uplift[2] even with per-CPU queues. So here is a small
series of these two changes. This still applies on top of my series[1]
because its first patch titled "iommu: Allow .iotlb_sync_map to fail and
handle s390's -ENOMEM return" enable ops->iotlb_sync_map to return
errors and virtio-iommu's sync can fail. This also makes sure there is
no merge conflict with that series.

The code is also available on the b4/viommu-deferred-flush branch of my
kernel.org git repository[3]

Thanks,
Niklas

[0] https://lore.kernel.org/lkml/[email protected]/
[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/lkml/20230802123612.GA6142@myrica/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git/log/?h=b4/viommu-deferred-flush

Signed-off-by: Niklas Schnelle <[email protected]>
---
Changes in v2:
- Check for viommu == NULL in viommu_sync_req() instead of for
0 endpoints in ops (Jean-Philippe)
- Added comment where viommu can be NULL (me)
- Link to v1: https://lore.kernel.org/r/[email protected]

---
Niklas Schnelle (2):
iommu/virtio: Make use of ops->iotlb_sync_map
iommu/virtio: Add ops->flush_iotlb_all and enable deferred flush

drivers/iommu/virtio-iommu.c | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)
---
base-commit: e165388f6d32dc8a49f49ef6e80584ad3def3d78
change-id: 20230825-viommu-sync-map-1bf0cc4fdc15

Best regards,
--
Niklas Schnelle
Linux on Z Development

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement - https://www.ibm.com/privacy


2023-09-18 20:01:56

by Niklas Schnelle

[permalink] [raw]
Subject: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

Pull out the sync operation from viommu_map_pages() by implementing
ops->iotlb_sync_map. This allows the common IOMMU code to map multiple
elements of an sg with a single sync (see iommu_map_sg()). Furthermore,
it is also a requirement for IOMMU_CAP_DEFERRED_FLUSH.

Link: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Niklas Schnelle <[email protected]>
---
drivers/iommu/virtio-iommu.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 17dcd826f5c2..3649586f0e5c 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu)
int ret;
unsigned long flags;

+ /*
+ * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu
+ * is initialized e.g. via iommu_create_device_direct_mappings()
+ */
+ if (!viommu)
+ return 0;
spin_lock_irqsave(&viommu->request_lock, flags);
ret = __viommu_sync_req(viommu);
if (ret)
@@ -843,7 +849,7 @@ static int viommu_map_pages(struct iommu_domain *domain, unsigned long iova,
.flags = cpu_to_le32(flags),
};

- ret = viommu_send_req_sync(vdomain->viommu, &map, sizeof(map));
+ ret = viommu_add_req(vdomain->viommu, &map, sizeof(map));
if (ret) {
viommu_del_mappings(vdomain, iova, end);
return ret;
@@ -912,6 +918,14 @@ static void viommu_iotlb_sync(struct iommu_domain *domain,
viommu_sync_req(vdomain->viommu);
}

+static int viommu_iotlb_sync_map(struct iommu_domain *domain,
+ unsigned long iova, size_t size)
+{
+ struct viommu_domain *vdomain = to_viommu_domain(domain);
+
+ return viommu_sync_req(vdomain->viommu);
+}
+
static void viommu_get_resv_regions(struct device *dev, struct list_head *head)
{
struct iommu_resv_region *entry, *new_entry, *msi = NULL;
@@ -1058,6 +1072,7 @@ static struct iommu_ops viommu_ops = {
.unmap_pages = viommu_unmap_pages,
.iova_to_phys = viommu_iova_to_phys,
.iotlb_sync = viommu_iotlb_sync,
+ .iotlb_sync_map = viommu_iotlb_sync_map,
.free = viommu_domain_free,
}
};

--
2.39.2

2023-09-18 20:15:42

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

On Mon, Sep 18, 2023 at 01:51:43PM +0200, Niklas Schnelle wrote:
> Pull out the sync operation from viommu_map_pages() by implementing
> ops->iotlb_sync_map. This allows the common IOMMU code to map multiple
> elements of an sg with a single sync (see iommu_map_sg()). Furthermore,
> it is also a requirement for IOMMU_CAP_DEFERRED_FLUSH.
>
> Link: https://lore.kernel.org/lkml/[email protected]/
> Signed-off-by: Niklas Schnelle <[email protected]>

Reviewed-by: Jean-Philippe Brucker <[email protected]>

This must be merged after "iommu/dma: s390 DMA API conversion and
optimized IOTLB flushing" because of the updated iotlb_sync_map()
prototype.

Thanks,
Jean

> ---
> drivers/iommu/virtio-iommu.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 17dcd826f5c2..3649586f0e5c 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu)
> int ret;
> unsigned long flags;
>
> + /*
> + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu
> + * is initialized e.g. via iommu_create_device_direct_mappings()
> + */
> + if (!viommu)
> + return 0;
> spin_lock_irqsave(&viommu->request_lock, flags);
> ret = __viommu_sync_req(viommu);
> if (ret)
> @@ -843,7 +849,7 @@ static int viommu_map_pages(struct iommu_domain *domain, unsigned long iova,
> .flags = cpu_to_le32(flags),
> };
>
> - ret = viommu_send_req_sync(vdomain->viommu, &map, sizeof(map));
> + ret = viommu_add_req(vdomain->viommu, &map, sizeof(map));
> if (ret) {
> viommu_del_mappings(vdomain, iova, end);
> return ret;
> @@ -912,6 +918,14 @@ static void viommu_iotlb_sync(struct iommu_domain *domain,
> viommu_sync_req(vdomain->viommu);
> }
>
> +static int viommu_iotlb_sync_map(struct iommu_domain *domain,
> + unsigned long iova, size_t size)
> +{
> + struct viommu_domain *vdomain = to_viommu_domain(domain);
> +
> + return viommu_sync_req(vdomain->viommu);
> +}
> +
> static void viommu_get_resv_regions(struct device *dev, struct list_head *head)
> {
> struct iommu_resv_region *entry, *new_entry, *msi = NULL;
> @@ -1058,6 +1072,7 @@ static struct iommu_ops viommu_ops = {
> .unmap_pages = viommu_unmap_pages,
> .iova_to_phys = viommu_iova_to_phys,
> .iotlb_sync = viommu_iotlb_sync,
> + .iotlb_sync_map = viommu_iotlb_sync_map,
> .free = viommu_domain_free,
> }
> };
>
> --
> 2.39.2
>

2023-09-19 08:30:49

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

On 2023-09-19 09:15, Jean-Philippe Brucker wrote:
> On Mon, Sep 18, 2023 at 05:37:47PM +0100, Robin Murphy wrote:
>>> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
>>> index 17dcd826f5c2..3649586f0e5c 100644
>>> --- a/drivers/iommu/virtio-iommu.c
>>> +++ b/drivers/iommu/virtio-iommu.c
>>> @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu)
>>> int ret;
>>> unsigned long flags;
>>> + /*
>>> + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu
>>> + * is initialized e.g. via iommu_create_device_direct_mappings()
>>> + */
>>> + if (!viommu)
>>> + return 0;
>>
>> Minor nit: I'd be inclined to make that check explicitly in the places where
>> it definitely is expected, rather than allowing *any* sync to silently do
>> nothing if called incorrectly. Plus then they could use
>> vdomain->nr_endpoints for consistency with the equivalent checks elsewhere
>> (it did take me a moment to figure out how we could get to .iotlb_sync_map
>> with a NULL viommu without viommu_map_pages() blowing up first...)
>
> They're not strictly equivalent: this check works around a temporary issue
> with the IOMMU core, which calls map/unmap before the domain is finalized.
> Once we merge domain_alloc() and finalize(), then this check disappears,
> but we still need to test nr_endpoints in map/unmap to handle detached
> domains (and we still need to fix the synchronization of nr_endpoints
> against attach/detach). That's why I preferred doing this on viommu and
> keeping it in one place.

Fair enough - it just seems to me that in both cases it's a detached
domain, so its previous history of whether it's ever been otherwise or
not shouldn't matter. Even once viommu is initialised, does it really
make sense to send sync commands for a mapping on a detached domain
where we haven't actually sent any map/unmap commands?

Thanks,
Robin.

2023-09-19 16:51:40

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

On Tue, Sep 19, 2023 at 09:15:19AM +0100, Jean-Philippe Brucker wrote:
> On Mon, Sep 18, 2023 at 05:37:47PM +0100, Robin Murphy wrote:
> > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> > > index 17dcd826f5c2..3649586f0e5c 100644
> > > --- a/drivers/iommu/virtio-iommu.c
> > > +++ b/drivers/iommu/virtio-iommu.c
> > > @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu)
> > > int ret;
> > > unsigned long flags;
> > > + /*
> > > + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu
> > > + * is initialized e.g. via iommu_create_device_direct_mappings()
> > > + */
> > > + if (!viommu)
> > > + return 0;
> >
> > Minor nit: I'd be inclined to make that check explicitly in the places where
> > it definitely is expected, rather than allowing *any* sync to silently do
> > nothing if called incorrectly. Plus then they could use
> > vdomain->nr_endpoints for consistency with the equivalent checks elsewhere
> > (it did take me a moment to figure out how we could get to .iotlb_sync_map
> > with a NULL viommu without viommu_map_pages() blowing up first...)

This makes more sense to me

Ultimately this driver should reach a point where every iommu_domain
always has a non-null domain->viommu because it will be set during
alloc.

But it can still have nr_endpoints == 0, doesn't it make sense to
avoid sync in this case?

(btw this driver is missing locking around vdomain->nr_endpoints)

> They're not strictly equivalent: this check works around a temporary issue
> with the IOMMU core, which calls map/unmap before the domain is
> finalized.

Where? The above points to iommu_create_device_direct_mappings() but
it doesn't because the pgsize_bitmap == 0:

static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
struct device *dev)
{
struct iommu_resv_region *entry;
struct list_head mappings;
unsigned long pg_size;
int ret = 0;

pg_size = domain->pgsize_bitmap ? 1UL << __ffs(domain->pgsize_bitmap) : 0;
INIT_LIST_HEAD(&mappings);

if (WARN_ON_ONCE(iommu_is_dma_domain(domain) && !pg_size))

Indeed, the driver should be failing all map's until the domain is
finalized because it has no way to check the IOVA matches the eventual
aperture.

Jason

2023-09-22 07:56:02

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

On Tue, Sep 19, 2023 at 09:28:08AM +0100, Robin Murphy wrote:
> On 2023-09-19 09:15, Jean-Philippe Brucker wrote:
> > On Mon, Sep 18, 2023 at 05:37:47PM +0100, Robin Murphy wrote:
> > > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> > > > index 17dcd826f5c2..3649586f0e5c 100644
> > > > --- a/drivers/iommu/virtio-iommu.c
> > > > +++ b/drivers/iommu/virtio-iommu.c
> > > > @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu)
> > > > int ret;
> > > > unsigned long flags;
> > > > + /*
> > > > + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu
> > > > + * is initialized e.g. via iommu_create_device_direct_mappings()
> > > > + */
> > > > + if (!viommu)
> > > > + return 0;
> > >
> > > Minor nit: I'd be inclined to make that check explicitly in the places where
> > > it definitely is expected, rather than allowing *any* sync to silently do
> > > nothing if called incorrectly. Plus then they could use
> > > vdomain->nr_endpoints for consistency with the equivalent checks elsewhere
> > > (it did take me a moment to figure out how we could get to .iotlb_sync_map
> > > with a NULL viommu without viommu_map_pages() blowing up first...)
> >
> > They're not strictly equivalent: this check works around a temporary issue
> > with the IOMMU core, which calls map/unmap before the domain is finalized.
> > Once we merge domain_alloc() and finalize(), then this check disappears,
> > but we still need to test nr_endpoints in map/unmap to handle detached
> > domains (and we still need to fix the synchronization of nr_endpoints
> > against attach/detach). That's why I preferred doing this on viommu and
> > keeping it in one place.
>
> Fair enough - it just seems to me that in both cases it's a detached domain,
> so its previous history of whether it's ever been otherwise or not shouldn't
> matter. Even once viommu is initialised, does it really make sense to send
> sync commands for a mapping on a detached domain where we haven't actually
> sent any map/unmap commands?

If no requests were added by map/unmap, then viommu_sync_req() is
essentially a nop because virtio doesn't use sync commands (and
virtqueue_kick() only kicks the host when the queue's not empty, if I
remember correctly). It still does a bit of work so is less efficient than
a preliminary check on nr_endpoints, but it feels nicer to streamline the
case where the domain is attached, since map/unmap on detached domains
happens rarely, if ever.

Either is fine by me. An extra test won't make much difference performance
wise, and I guess will look less confusing. Niklas, do you mind resending
the version with nr_endpoints check (and updated commit messages)?

Thanks,
Jean

2023-09-22 13:24:41

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

On 22/09/2023 1:41 pm, Jason Gunthorpe wrote:
> On Fri, Sep 22, 2023 at 08:57:19AM +0100, Jean-Philippe Brucker wrote:
>>>> They're not strictly equivalent: this check works around a temporary issue
>>>> with the IOMMU core, which calls map/unmap before the domain is
>>>> finalized.
>>>
>>> Where? The above points to iommu_create_device_direct_mappings() but
>>> it doesn't because the pgsize_bitmap == 0:
>>
>> __iommu_domain_alloc() sets pgsize_bitmap in this case:
>>
>> /*
>> * If not already set, assume all sizes by default; the driver
>> * may override this later
>> */
>> if (!domain->pgsize_bitmap)
>> domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap;
>
> Dirver's shouldn't do that.
>
> The core code was fixed to try again with mapping reserved regions to
> support these kinds of drivers.

This is still the "normal" code path, really; I think it's only AMD that
started initialising the domain bitmap "early" and warranted making it
conditional. However we *do* ultimately want all the drivers to do the
same, so we can get rid of ops->pgsize_bitmap, because it's already
pretty redundant and meaningless in the face of per-domain pagetable
formats.

Thanks,
Robin.

2023-09-22 15:08:52

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

On Tue, Sep 19, 2023 at 11:46:49AM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 19, 2023 at 09:15:19AM +0100, Jean-Philippe Brucker wrote:
> > On Mon, Sep 18, 2023 at 05:37:47PM +0100, Robin Murphy wrote:
> > > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> > > > index 17dcd826f5c2..3649586f0e5c 100644
> > > > --- a/drivers/iommu/virtio-iommu.c
> > > > +++ b/drivers/iommu/virtio-iommu.c
> > > > @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu)
> > > > int ret;
> > > > unsigned long flags;
> > > > + /*
> > > > + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu
> > > > + * is initialized e.g. via iommu_create_device_direct_mappings()
> > > > + */
> > > > + if (!viommu)
> > > > + return 0;
> > >
> > > Minor nit: I'd be inclined to make that check explicitly in the places where
> > > it definitely is expected, rather than allowing *any* sync to silently do
> > > nothing if called incorrectly. Plus then they could use
> > > vdomain->nr_endpoints for consistency with the equivalent checks elsewhere
> > > (it did take me a moment to figure out how we could get to .iotlb_sync_map
> > > with a NULL viommu without viommu_map_pages() blowing up first...)
>
> This makes more sense to me
>
> Ultimately this driver should reach a point where every iommu_domain
> always has a non-null domain->viommu because it will be set during
> alloc.
>
> But it can still have nr_endpoints == 0, doesn't it make sense to
> avoid sync in this case?
>
> (btw this driver is missing locking around vdomain->nr_endpoints)

Yes, that's on my list

>
> > They're not strictly equivalent: this check works around a temporary issue
> > with the IOMMU core, which calls map/unmap before the domain is
> > finalized.
>
> Where? The above points to iommu_create_device_direct_mappings() but
> it doesn't because the pgsize_bitmap == 0:

__iommu_domain_alloc() sets pgsize_bitmap in this case:

/*
* If not already set, assume all sizes by default; the driver
* may override this later
*/
if (!domain->pgsize_bitmap)
domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap;

Thanks,
Jean

2023-09-22 16:49:30

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

On Fri, Sep 22, 2023 at 08:57:19AM +0100, Jean-Philippe Brucker wrote:
> > > They're not strictly equivalent: this check works around a temporary issue
> > > with the IOMMU core, which calls map/unmap before the domain is
> > > finalized.
> >
> > Where? The above points to iommu_create_device_direct_mappings() but
> > it doesn't because the pgsize_bitmap == 0:
>
> __iommu_domain_alloc() sets pgsize_bitmap in this case:
>
> /*
> * If not already set, assume all sizes by default; the driver
> * may override this later
> */
> if (!domain->pgsize_bitmap)
> domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap;

Dirver's shouldn't do that.

The core code was fixed to try again with mapping reserved regions to
support these kinds of drivers.

Jason

2023-09-22 18:22:55

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

On 22/09/2023 5:27 pm, Jason Gunthorpe wrote:
> On Fri, Sep 22, 2023 at 02:13:18PM +0100, Robin Murphy wrote:
>> On 22/09/2023 1:41 pm, Jason Gunthorpe wrote:
>>> On Fri, Sep 22, 2023 at 08:57:19AM +0100, Jean-Philippe Brucker wrote:
>>>>>> They're not strictly equivalent: this check works around a temporary issue
>>>>>> with the IOMMU core, which calls map/unmap before the domain is
>>>>>> finalized.
>>>>>
>>>>> Where? The above points to iommu_create_device_direct_mappings() but
>>>>> it doesn't because the pgsize_bitmap == 0:
>>>>
>>>> __iommu_domain_alloc() sets pgsize_bitmap in this case:
>>>>
>>>> /*
>>>> * If not already set, assume all sizes by default; the driver
>>>> * may override this later
>>>> */
>>>> if (!domain->pgsize_bitmap)
>>>> domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap;
>>>
>>> Dirver's shouldn't do that.
>>>
>>> The core code was fixed to try again with mapping reserved regions to
>>> support these kinds of drivers.
>>
>> This is still the "normal" code path, really; I think it's only AMD that
>> started initialising the domain bitmap "early" and warranted making it
>> conditional.
>
> My main point was that iommu_create_device_direct_mappings() should
> fail for unfinalized domains, setting pgsize_bitmap to allow it to
> succeed is not a nice hack, and not necessary now.

Sure, but it's the whole "unfinalised domains" and rewriting
domain->pgsize_bitmap after attach thing that is itself the massive
hack. AMD doesn't do that, and doesn't need to; it knows the appropriate
format at allocation time and can quite happily return a fully working
domain which allows map before attach, but the old ops->pgsize_bitmap
mechanism fundamentally doesn't work for multiple formats with different
page sizes. The only thing I'd accuse it of doing wrong is the weird
half-and-half thing of having one format as a default via one mechanism,
and the other as an override through the other, rather than setting both
explicitly.

virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings
either; it sets it once it's discovered any instance, since apparently
it's assuming that all instances must support identical page sizes, and
thus once it's seen one it can work "normally" per the core code's
assumptions. It's also I think the only driver which has a "finalise"
bodge but *can* still properly support map-before-attach, by virtue of
having to replay mappings to every new endpoint anyway.

> What do you think about something like this to replace
> iommu_create_device_direct_mappings(), that does enforce things
> properly?

I fail to see how that would make any practical difference. Either the
mappings can be correctly set up in a pagetable *before* the relevant
device is attached to that pagetable, or they can't (if the driver
doesn't have enough information to be able to do so) and we just have to
really hope nothing blows up in the race window between attaching the
device to an empty pagetable and having a second try at
iommu_create_device_direct_mappings(). That's a driver-level issue and
has nothing to do with pgsize_bitmap either way.

Thanks,
Robin.

2023-09-23 00:52:00

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

On Fri, Sep 22, 2023 at 02:13:18PM +0100, Robin Murphy wrote:
> On 22/09/2023 1:41 pm, Jason Gunthorpe wrote:
> > On Fri, Sep 22, 2023 at 08:57:19AM +0100, Jean-Philippe Brucker wrote:
> > > > > They're not strictly equivalent: this check works around a temporary issue
> > > > > with the IOMMU core, which calls map/unmap before the domain is
> > > > > finalized.
> > > >
> > > > Where? The above points to iommu_create_device_direct_mappings() but
> > > > it doesn't because the pgsize_bitmap == 0:
> > >
> > > __iommu_domain_alloc() sets pgsize_bitmap in this case:
> > >
> > > /*
> > > * If not already set, assume all sizes by default; the driver
> > > * may override this later
> > > */
> > > if (!domain->pgsize_bitmap)
> > > domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap;
> >
> > Dirver's shouldn't do that.
> >
> > The core code was fixed to try again with mapping reserved regions to
> > support these kinds of drivers.
>
> This is still the "normal" code path, really; I think it's only AMD that
> started initialising the domain bitmap "early" and warranted making it
> conditional.

My main point was that iommu_create_device_direct_mappings() should
fail for unfinalized domains, setting pgsize_bitmap to allow it to
succeed is not a nice hack, and not necessary now.

What do you think about something like this to replace
iommu_create_device_direct_mappings(), that does enforce things
properly?


static int resv_cmp(void *priv, const struct list_head *llhs,
const struct list_head *lrhs)
{
struct iommu_resv_region *lhs = list_entry(llhs, struct iommu_resv_region, list);
struct iommu_resv_region *rhs = list_entry(lrhs, struct iommu_resv_region, list);

if (lhs->start == rhs->start)
return 0;
if (lhs->start < rhs->start)
return -1;
return 1;
}

static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
struct device *dev)
{
struct iommu_resv_region *entry;
struct iommu_resv_region *tmp;
struct list_head mappings;
struct list_head direct;
phys_addr_t cur = 0;
int ret = 0;

INIT_LIST_HEAD(&mappings);
INIT_LIST_HEAD(&direct);

iommu_get_resv_regions(dev, &mappings);

list_for_each_entry_safe(entry, tmp, &mappings, list) {
if (entry->type == IOMMU_RESV_DIRECT)
dev->iommu->require_direct = 1;

if ((domain->type & __IOMMU_DOMAIN_PAGING) &&
(entry->type == IOMMU_RESV_DIRECT ||
entry->type == IOMMU_RESV_DIRECT_RELAXABLE)) {
if (domain->geometry.aperture_start > entry->start ||
domain->geometry.aperture_end == 0 ||
(domain->geometry.aperture_end - 1) <
(entry->start + entry->length - 1)) {
ret = -EINVAL;
goto out;
}
list_move(&entry->list, &direct);
}
}

if (list_empty(&direct))
goto out;

/*
* FW can have overlapping ranges, sort the list by start address
* and map any duplicated IOVA only once.
*/
list_sort(NULL, &direct, resv_cmp);
list_for_each_entry(entry, &direct, list) {
phys_addr_t start_pfn = entry->start / PAGE_SIZE;
phys_addr_t last_pfn =
(entry->length - 1 + entry->start) / PAGE_SIZE;

if (start_pfn < cur)
start_pfn = cur;

if (start_pfn <= last_pfn) {
ret = iommu_map(domain, start_pfn * PAGE_SIZE,
start_pfn * PAGE_SIZE,
(last_pfn - start_pfn + 1) * PAGE_SIZE,
entry->prot, GFP_KERNEL);
if (ret)
goto out;
cur = last_pfn + 1;
}
}

out:
list_splice(&direct, &mappings);
iommu_put_resv_regions(dev, &mappings);

return ret;
}

2023-09-23 01:56:14

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote:

> virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings
> either; it sets it once it's discovered any instance, since apparently it's
> assuming that all instances must support identical page sizes, and thus once
> it's seen one it can work "normally" per the core code's assumptions. It's
> also I think the only driver which has a "finalise" bodge but *can* still
> properly support map-before-attach, by virtue of having to replay mappings
> to every new endpoint anyway.

Well it can't quite do that since it doesn't know the geometry - it
all is sort of guessing and hoping it doesn't explode on replay. If it
knows the geometry it wouldn't need finalize...

> > What do you think about something like this to replace
> > iommu_create_device_direct_mappings(), that does enforce things
> > properly?
>
> I fail to see how that would make any practical difference. Either the
> mappings can be correctly set up in a pagetable *before* the relevant device
> is attached to that pagetable, or they can't (if the driver doesn't have
> enough information to be able to do so) and we just have to really hope
> nothing blows up in the race window between attaching the device to an empty
> pagetable and having a second try at iommu_create_device_direct_mappings().
> That's a driver-level issue and has nothing to do with pgsize_bitmap either
> way.

Except we don't detect this in the core code correctly, that is my
point. We should detect the aperture conflict, not pgsize_bitmap to
check if it is the first or second try.

Jason

2023-09-25 04:58:14

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

On 9/23/23 7:33 AM, Jason Gunthorpe wrote:
> On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote:
>
>> virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings
>> either; it sets it once it's discovered any instance, since apparently it's
>> assuming that all instances must support identical page sizes, and thus once
>> it's seen one it can work "normally" per the core code's assumptions. It's
>> also I think the only driver which has a "finalise" bodge but*can* still
>> properly support map-before-attach, by virtue of having to replay mappings
>> to every new endpoint anyway.
> Well it can't quite do that since it doesn't know the geometry - it
> all is sort of guessing and hoping it doesn't explode on replay. If it
> knows the geometry it wouldn't need finalize...

The ultimate solution to this problem seems to be to add device pointer
to the parameter of ops->domain_alloc. So once the domain is allocated,
it is fully initialized. Attaching this domain to a device that is not
compatible will return -EINVAL, then the caller has to allocate a new
domain for this device.

I feel that this is not an AMD specific problem, other iommu drivers
will also encounter the similar problem sooner or later.

Best regards,
baolu

2023-09-25 13:20:06

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

On 2023-09-23 00:33, Jason Gunthorpe wrote:
> On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote:
>
>> virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings
>> either; it sets it once it's discovered any instance, since apparently it's
>> assuming that all instances must support identical page sizes, and thus once
>> it's seen one it can work "normally" per the core code's assumptions. It's
>> also I think the only driver which has a "finalise" bodge but *can* still
>> properly support map-before-attach, by virtue of having to replay mappings
>> to every new endpoint anyway.
>
> Well it can't quite do that since it doesn't know the geometry - it
> all is sort of guessing and hoping it doesn't explode on replay. If it
> knows the geometry it wouldn't need finalize...

I think it's entirely reasonable to assume that any direct mappings
specified for a device are valid for that device and its IOMMU. However,
in the particular case of virtio, it really shouldn't ever have direct
mappings anyway, since even if the underlying hardware did have any, the
host can enforce the actual direct-mapping aspect itself, and just
present them as unusable regions to the guest.

>>> What do you think about something like this to replace
>>> iommu_create_device_direct_mappings(), that does enforce things
>>> properly?
>>
>> I fail to see how that would make any practical difference. Either the
>> mappings can be correctly set up in a pagetable *before* the relevant device
>> is attached to that pagetable, or they can't (if the driver doesn't have
>> enough information to be able to do so) and we just have to really hope
>> nothing blows up in the race window between attaching the device to an empty
>> pagetable and having a second try at iommu_create_device_direct_mappings().
>> That's a driver-level issue and has nothing to do with pgsize_bitmap either
>> way.
>
> Except we don't detect this in the core code correctly, that is my
> point. We should detect the aperture conflict, not pgsize_bitmap to
> check if it is the first or second try.

Again, that's irrelevant. It can only be about whether the actual
->map_pages call succeeds or not. A driver could well know up-front that
all instances support the same pgsize_bitmap and aperture, and set both
at ->domain_alloc time, yet still be unable to handle an actual mapping
without knowing which instance(s) that needs to interact with (e.g.
omap-iommu).

Thanks,
Robin.

2023-09-25 14:00:37

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

On Mon, Sep 25, 2023 at 10:48:21AM +0800, Baolu Lu wrote:
> On 9/23/23 7:33 AM, Jason Gunthorpe wrote:
> > On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote:
> >
> > > virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings
> > > either; it sets it once it's discovered any instance, since apparently it's
> > > assuming that all instances must support identical page sizes, and thus once
> > > it's seen one it can work "normally" per the core code's assumptions. It's
> > > also I think the only driver which has a "finalise" bodge but*can* still
> > > properly support map-before-attach, by virtue of having to replay mappings
> > > to every new endpoint anyway.
> > Well it can't quite do that since it doesn't know the geometry - it
> > all is sort of guessing and hoping it doesn't explode on replay. If it
> > knows the geometry it wouldn't need finalize...
>
> The ultimate solution to this problem seems to be to add device pointer
> to the parameter of ops->domain_alloc. So once the domain is allocated,
> it is fully initialized. Attaching this domain to a device that is not
> compatible will return -EINVAL, then the caller has to allocate a new
> domain for this device.

Sure, it looks like this, and it works already for default domain
setup.

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 8599394c9157d7..1697f5a2370785 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -637,15 +637,10 @@ static void viommu_event_handler(struct virtqueue *vq)

/* IOMMU API */

-static struct iommu_domain *viommu_domain_alloc(unsigned type)
+static struct viommu_domain *__viommu_domain_alloc(void)
{
struct viommu_domain *vdomain;

- if (type != IOMMU_DOMAIN_UNMANAGED &&
- type != IOMMU_DOMAIN_DMA &&
- type != IOMMU_DOMAIN_IDENTITY)
- return NULL;
-
vdomain = kzalloc(sizeof(*vdomain), GFP_KERNEL);
if (!vdomain)
return NULL;
@@ -654,16 +649,32 @@ static struct iommu_domain *viommu_domain_alloc(unsigned type)
spin_lock_init(&vdomain->mappings_lock);
vdomain->mappings = RB_ROOT_CACHED;

+ return vdomain;
+}
+
+static struct iommu_domain *viommu_domain_alloc(unsigned type)
+{
+ struct viommu_domain *vdomain;
+
+ /*
+ * viommu sometimes creates an identity domain out of a
+ * paging domain, that can't use the global static.
+ * During attach it will fill in an identity page table.
+ */
+ if (type != IOMMU_DOMAIN_IDENTITY)
+ return NULL;
+ vdomain = __viommu_domain_alloc();
+ if (!vdomain)
+ return NULL;
return &vdomain->domain;
}

static int viommu_domain_finalise(struct viommu_endpoint *vdev,
- struct iommu_domain *domain)
+ struct viommu_domain *vdomain)
{
int ret;
unsigned long viommu_page_size;
struct viommu_dev *viommu = vdev->viommu;
- struct viommu_domain *vdomain = to_viommu_domain(domain);

viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);
if (viommu_page_size > PAGE_SIZE) {
@@ -680,13 +691,13 @@ static int viommu_domain_finalise(struct viommu_endpoint *vdev,

vdomain->id = (unsigned int)ret;

- domain->pgsize_bitmap = viommu->pgsize_bitmap;
- domain->geometry = viommu->geometry;
+ vdomain->domain.pgsize_bitmap = viommu->pgsize_bitmap;
+ vdomain->domain.geometry = viommu->geometry;

vdomain->map_flags = viommu->map_flags;
vdomain->viommu = viommu;

- if (domain->type == IOMMU_DOMAIN_IDENTITY) {
+ if (vdomain->domain.type == IOMMU_DOMAIN_IDENTITY) {
if (virtio_has_feature(viommu->vdev,
VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
vdomain->bypass = true;
@@ -717,6 +728,24 @@ static void viommu_domain_free(struct iommu_domain *domain)
kfree(vdomain);
}

+static struct iommu_domain *viommu_domain_alloc_paging(struct device *dev)
+{
+ struct viommu_domain *vdomain;
+ vdomain = __viommu_domain_alloc();
+ if (!vdomain)
+ return NULL;
+
+ if (dev) {
+ struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
+
+ if (viommu_domain_finalise(vdev, vdomain)) {
+ viommu_domain_free(&vdomain->domain);
+ return NULL;
+ }
+ }
+ return &vdomain->domain;
+}
+
static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev)
{
int i;
@@ -732,7 +761,7 @@ static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev)
* Properly initialize the domain now that we know which viommu
* owns it.
*/
- ret = viommu_domain_finalise(vdev, domain);
+ ret = viommu_domain_finalise(vdev, vdomain);
} else if (vdomain->viommu != vdev->viommu) {
ret = -EINVAL;
}
@@ -1045,6 +1074,7 @@ static bool viommu_capable(struct device *dev, enum iommu_cap cap)
static struct iommu_ops viommu_ops = {
.capable = viommu_capable,
.domain_alloc = viommu_domain_alloc,
+ .domain_alloc_paging = viommu_domain_alloc_paging,
.probe_device = viommu_probe_device,
.probe_finalize = viommu_probe_finalize,
.release_device = viommu_release_device,

2023-09-25 14:30:22

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

On Mon, Sep 25, 2023 at 02:07:50PM +0100, Robin Murphy wrote:
> On 2023-09-23 00:33, Jason Gunthorpe wrote:
> > On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote:
> >
> > > virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings
> > > either; it sets it once it's discovered any instance, since apparently it's
> > > assuming that all instances must support identical page sizes, and thus once
> > > it's seen one it can work "normally" per the core code's assumptions. It's
> > > also I think the only driver which has a "finalise" bodge but *can* still
> > > properly support map-before-attach, by virtue of having to replay mappings
> > > to every new endpoint anyway.
> >
> > Well it can't quite do that since it doesn't know the geometry - it
> > all is sort of guessing and hoping it doesn't explode on replay. If it
> > knows the geometry it wouldn't need finalize...
>
> I think it's entirely reasonable to assume that any direct mappings
> specified for a device are valid for that device and its IOMMU. However, in
> the particular case of virtio, it really shouldn't ever have direct mappings
> anyway, since even if the underlying hardware did have any, the host can
> enforce the actual direct-mapping aspect itself, and just present them as
> unusable regions to the guest.

I assume this machinery is for the ARM GIC ITS page....

> Again, that's irrelevant. It can only be about whether the actual
> ->map_pages call succeeds or not. A driver could well know up-front that all
> instances support the same pgsize_bitmap and aperture, and set both at
> ->domain_alloc time, yet still be unable to handle an actual mapping without
> knowing which instance(s) that needs to interact with (e.g. omap-iommu).

I think this is a different issue. The domain is supposed to represent
the actual io pte storage, and the storage is supposed to exist even
when the domain is not attached to anything.

As we said with tegra-gart, it is a bug in the driver if all the
mappings disappear when the last device is detached from the domain.
Driver bugs like this turn into significant issues with vfio/iommufd
as this will result in warn_on's and memory leaking.

So, I disagree that this is something we should be allowing in the API
design. map_pages should succeed (memory allocation failures aside) if
a IOVA within the aperture and valid flags are presented. Regardless
of the attachment status. Calling map_pages with an IOVA outside the
aperture should be a caller bug.

It looks omap is just mis-designed to store the pgd in the omap_iommu,
not the omap_iommu_domain :( pgd is clearly a per-domain object in our
API. And why does every instance need its own copy of the identical
pgd?

Jason

2023-09-25 17:26:32

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

On 2023-09-25 14:29, Jason Gunthorpe wrote:
> On Mon, Sep 25, 2023 at 02:07:50PM +0100, Robin Murphy wrote:
>> On 2023-09-23 00:33, Jason Gunthorpe wrote:
>>> On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote:
>>>
>>>> virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings
>>>> either; it sets it once it's discovered any instance, since apparently it's
>>>> assuming that all instances must support identical page sizes, and thus once
>>>> it's seen one it can work "normally" per the core code's assumptions. It's
>>>> also I think the only driver which has a "finalise" bodge but *can* still
>>>> properly support map-before-attach, by virtue of having to replay mappings
>>>> to every new endpoint anyway.
>>>
>>> Well it can't quite do that since it doesn't know the geometry - it
>>> all is sort of guessing and hoping it doesn't explode on replay. If it
>>> knows the geometry it wouldn't need finalize...
>>
>> I think it's entirely reasonable to assume that any direct mappings
>> specified for a device are valid for that device and its IOMMU. However, in
>> the particular case of virtio, it really shouldn't ever have direct mappings
>> anyway, since even if the underlying hardware did have any, the host can
>> enforce the actual direct-mapping aspect itself, and just present them as
>> unusable regions to the guest.
>
> I assume this machinery is for the ARM GIC ITS page....
>
>> Again, that's irrelevant. It can only be about whether the actual
>> ->map_pages call succeeds or not. A driver could well know up-front that all
>> instances support the same pgsize_bitmap and aperture, and set both at
>> ->domain_alloc time, yet still be unable to handle an actual mapping without
>> knowing which instance(s) that needs to interact with (e.g. omap-iommu).
>
> I think this is a different issue. The domain is supposed to represent
> the actual io pte storage, and the storage is supposed to exist even
> when the domain is not attached to anything.
>
> As we said with tegra-gart, it is a bug in the driver if all the
> mappings disappear when the last device is detached from the domain.
> Driver bugs like this turn into significant issues with vfio/iommufd
> as this will result in warn_on's and memory leaking.
>
> So, I disagree that this is something we should be allowing in the API
> design. map_pages should succeed (memory allocation failures aside) if
> a IOVA within the aperture and valid flags are presented. Regardless
> of the attachment status. Calling map_pages with an IOVA outside the
> aperture should be a caller bug.
>
> It looks omap is just mis-designed to store the pgd in the omap_iommu,
> not the omap_iommu_domain :( pgd is clearly a per-domain object in our
> API. And why does every instance need its own copy of the identical
> pgd?

The point wasn't that it was necessarily a good and justifiable example,
just that it is one that exists, to demonstrate that in general we have
no reasonable heuristic for guessing whether ->map_pages is going to
succeed or not other than by calling it and seeing if it succeeds or
not. And IMO it's a complete waste of time thinking about ways to make
such a heuristic possible instead of just getting on with fixing
iommu_domain_alloc() to make the problem disappear altogether. Once
Joerg pushes out the current queue I'll rebase and resend v4 of the bus
ops removal, then hopefully get back to despairing at the hideous pile
of WIP iommu_domain_alloc() patches I currently have on top of it...

Thanks,
Robin.