2023-01-04 16:00:08

by Niklas Schnelle

[permalink] [raw]
Subject: [PATCH v2 0/1] vfio/type1: Fix vfio-pci pass-through of ISM devices

Hi Alex,

This is v2 of my attempt of fixing an issue we have on s390 with vfio-pci
pass-through of the s390 specific virtual PCI device called ISM and used for
cross LPAR communication. As the patch tries to explain the fact that
vfio_test_domain_fgsp() uses an IOMMU mapping at IOVA 0 irrespective of any
reserved regions causes the ISM device to go into an error state and thus
becomes unusable for a KVM guest breaking pass-through. I tried to improve
the background and explanation compared to v1 hope its more clear now.

As for testing, I tested this based on current master on both on s390 where it
skips the reserved 0x0-0x100000000 range and on an AMD Ryzen 3990X where it
continues to do the test on DMA address 0 and sets domain->fgsp to true.

Thanks,
Niklas Schnelle

Changes since v1:
- Reworded commit message to hopefully explain things a bit better and
highlight that usually just mapping but not issuing DMAs for IOVAs in
a resverved region is harmless but still breaks things with ISM devices.
- Added a check for PAGE_SIZE * 2 alignment (Jason)

Niklas Schnelle (1):
vfio/type1: Respect IOMMU reserved regions in vfio_test_domain_fgsp()

drivers/vfio/vfio_iommu_type1.c | 30 +++++++++++++++++++-----------
1 file changed, 19 insertions(+), 11 deletions(-)

--
2.34.1


2023-01-04 16:02:33

by Niklas Schnelle

[permalink] [raw]
Subject: [PATCH v2 1/1] vfio/type1: Respect IOMMU reserved regions in vfio_test_domain_fgsp()

Since commit cbf7827bc5dc ("iommu/s390: Fix potential s390_domain
aperture shrinking") the s390 IOMMU driver uses reserved regions for the
system provided DMA ranges of PCI devices. Previously it reduced the
size of the IOMMU aperture and checked it on each mapping operation.
On current machines the system denies use of DMA addresses below 2^32 for
all PCI devices.

Usually mapping IOVAs in a reserved regions is harmless until a DMA
actually tries to utilize the mapping. However on s390 there is
a virtual PCI device called ISM which is implemented in firmware and
used for cross LPAR communication. Unlike real PCI devices this device
does not use the hardware IOMMU but inspects IOMMU translation tables
directly on IOTLB flush (s390 RPCIT instruction). If it detects IOVA
mappings outside the allowed ranges it goes into an error state. This
error state then causes the device to be unavailable to the KVM guest.

Analysing this we found that vfio_test_domain_fgsp() maps 2 pages at DMA
address 0 irrespective of the IOMMUs reserved regions. Even if usually
harmless this seems wrong in the general case so instead go through the
freshly updated IOVA list and try to find a range that isn't reserved,
and fits 2 pages, is PAGE_SIZE * 2 aligned. If found use that for
testing for fine grained super pages.

Fixes: 6fe1010d6d9c ("vfio/type1: DMA unmap chunking")
Reported-by: Matthew Rosato <[email protected]>
Signed-off-by: Niklas Schnelle <[email protected]>
---
v1 -> v2:
- Reworded commit message to hopefully explain things a bit better and
highlight that usually just mapping but not issuing DMAs for IOVAs in
a resverved region is harmless but still breaks things with ISM devices.
- Added a check for PAGE_SIZE * 2 alignment (Jason)

drivers/vfio/vfio_iommu_type1.c | 30 +++++++++++++++++++-----------
1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 23c24fe98c00..87b27ffb93d0 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1856,24 +1856,32 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
* significantly boosts non-hugetlbfs mappings and doesn't seem to hurt when
* hugetlbfs is in use.
*/
-static void vfio_test_domain_fgsp(struct vfio_domain *domain)
+static void vfio_test_domain_fgsp(struct vfio_domain *domain, struct list_head *regions)
{
- struct page *pages;
int ret, order = get_order(PAGE_SIZE * 2);
+ struct vfio_iova *region;
+ struct page *pages;

pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
if (!pages)
return;

- ret = iommu_map(domain->domain, 0, page_to_phys(pages), PAGE_SIZE * 2,
- IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE);
- if (!ret) {
- size_t unmapped = iommu_unmap(domain->domain, 0, PAGE_SIZE);
+ list_for_each_entry(region, regions, list) {
+ if (region->end - region->start < PAGE_SIZE * 2 ||
+ region->start % (PAGE_SIZE*2))
+ continue;

- if (unmapped == PAGE_SIZE)
- iommu_unmap(domain->domain, PAGE_SIZE, PAGE_SIZE);
- else
- domain->fgsp = true;
+ ret = iommu_map(domain->domain, region->start, page_to_phys(pages), PAGE_SIZE * 2,
+ IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE);
+ if (!ret) {
+ size_t unmapped = iommu_unmap(domain->domain, region->start, PAGE_SIZE);
+
+ if (unmapped == PAGE_SIZE)
+ iommu_unmap(domain->domain, region->start + PAGE_SIZE, PAGE_SIZE);
+ else
+ domain->fgsp = true;
+ }
+ break;
}

__free_pages(pages, order);
@@ -2326,7 +2334,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
}
}

- vfio_test_domain_fgsp(domain);
+ vfio_test_domain_fgsp(domain, &iova_copy);

/* replay mappings on new domains */
ret = vfio_iommu_replay(iommu, domain);
--
2.34.1

2023-01-05 16:16:02

by Matthew Rosato

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] vfio/type1: Respect IOMMU reserved regions in vfio_test_domain_fgsp()

On 1/4/23 10:42 AM, Niklas Schnelle wrote:
> Since commit cbf7827bc5dc ("iommu/s390: Fix potential s390_domain
> aperture shrinking") the s390 IOMMU driver uses reserved regions for the
> system provided DMA ranges of PCI devices. Previously it reduced the
> size of the IOMMU aperture and checked it on each mapping operation.
> On current machines the system denies use of DMA addresses below 2^32 for
> all PCI devices.
>
> Usually mapping IOVAs in a reserved regions is harmless until a DMA
> actually tries to utilize the mapping. However on s390 there is
> a virtual PCI device called ISM which is implemented in firmware and
> used for cross LPAR communication. Unlike real PCI devices this device
> does not use the hardware IOMMU but inspects IOMMU translation tables
> directly on IOTLB flush (s390 RPCIT instruction). If it detects IOVA
> mappings outside the allowed ranges it goes into an error state. This
> error state then causes the device to be unavailable to the KVM guest.
>
> Analysing this we found that vfio_test_domain_fgsp() maps 2 pages at DMA
> address 0 irrespective of the IOMMUs reserved regions. Even if usually
> harmless this seems wrong in the general case so instead go through the
> freshly updated IOVA list and try to find a range that isn't reserved,
> and fits 2 pages, is PAGE_SIZE * 2 aligned. If found use that for
> testing for fine grained super pages.
>
> Fixes: 6fe1010d6d9c ("vfio/type1: DMA unmap chunking")
> Reported-by: Matthew Rosato <[email protected]>
> Signed-off-by: Niklas Schnelle <[email protected]>

Thanks, this fixes the issue I'm seeing with ISM.

Reviewed-by: Matthew Rosato <[email protected]>


2023-01-06 17:41:37

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] vfio/type1: Respect IOMMU reserved regions in vfio_test_domain_fgsp()

On Wed, 4 Jan 2023 16:42:02 +0100
Niklas Schnelle <[email protected]> wrote:

> Since commit cbf7827bc5dc ("iommu/s390: Fix potential s390_domain
> aperture shrinking") the s390 IOMMU driver uses reserved regions for the

Are you asking for this in v6.2? Seems like the above was introduced
in v6.2 and I can't tell if this is sufficiently prevalent that we need
a fix in the same release.

> system provided DMA ranges of PCI devices. Previously it reduced the
> size of the IOMMU aperture and checked it on each mapping operation.
> On current machines the system denies use of DMA addresses below 2^32 for
> all PCI devices.
>
> Usually mapping IOVAs in a reserved regions is harmless until a DMA
> actually tries to utilize the mapping. However on s390 there is
> a virtual PCI device called ISM which is implemented in firmware and
> used for cross LPAR communication. Unlike real PCI devices this device
> does not use the hardware IOMMU but inspects IOMMU translation tables
> directly on IOTLB flush (s390 RPCIT instruction). If it detects IOVA
> mappings outside the allowed ranges it goes into an error state. This
> error state then causes the device to be unavailable to the KVM guest.
>
> Analysing this we found that vfio_test_domain_fgsp() maps 2 pages at DMA
> address 0 irrespective of the IOMMUs reserved regions. Even if usually
> harmless this seems wrong in the general case so instead go through the
> freshly updated IOVA list and try to find a range that isn't reserved,
> and fits 2 pages, is PAGE_SIZE * 2 aligned. If found use that for
> testing for fine grained super pages.
>
> Fixes: 6fe1010d6d9c ("vfio/type1: DMA unmap chunking")

Nit, the above patch pre-dates any notion of reserved regions, so isn't
this actually fixing the implementation of reserved regions in type1 to
include this test? ie.

Fixes: af029169b8fd ("vfio/type1: Check reserved region conflict and update iovalist")

> Reported-by: Matthew Rosato <[email protected]>
> Signed-off-by: Niklas Schnelle <[email protected]>
> ---
> v1 -> v2:
> - Reworded commit message to hopefully explain things a bit better and
> highlight that usually just mapping but not issuing DMAs for IOVAs in
> a resverved region is harmless but still breaks things with ISM devices.
> - Added a check for PAGE_SIZE * 2 alignment (Jason)
>
> drivers/vfio/vfio_iommu_type1.c | 30 +++++++++++++++++++-----------
> 1 file changed, 19 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 23c24fe98c00..87b27ffb93d0 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1856,24 +1856,32 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
> * significantly boosts non-hugetlbfs mappings and doesn't seem to hurt when
> * hugetlbfs is in use.
> */
> -static void vfio_test_domain_fgsp(struct vfio_domain *domain)
> +static void vfio_test_domain_fgsp(struct vfio_domain *domain, struct list_head *regions)
> {
> - struct page *pages;
> int ret, order = get_order(PAGE_SIZE * 2);
> + struct vfio_iova *region;
> + struct page *pages;
>
> pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
> if (!pages)
> return;
>
> - ret = iommu_map(domain->domain, 0, page_to_phys(pages), PAGE_SIZE * 2,
> - IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE);
> - if (!ret) {
> - size_t unmapped = iommu_unmap(domain->domain, 0, PAGE_SIZE);
> + list_for_each_entry(region, regions, list) {
> + if (region->end - region->start < PAGE_SIZE * 2 ||
> + region->start % (PAGE_SIZE*2))

Maybe this falls into the noise, but we don't care if region->start is
aligned to a double page, so long as we can map an aligned double page
within the region. Maybe something like:

dma_addr_t start = ALIGN(region->start, PAGE_SIZE * 2);

if (start >= region->end || (region->end - start < PAGE_SIZE * 2))
continue;


s/region->// for below if so. Thanks,

Alex

> + continue;
>
> - if (unmapped == PAGE_SIZE)
> - iommu_unmap(domain->domain, PAGE_SIZE, PAGE_SIZE);
> - else
> - domain->fgsp = true;
> + ret = iommu_map(domain->domain, region->start, page_to_phys(pages), PAGE_SIZE * 2,
> + IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE);
> + if (!ret) {
> + size_t unmapped = iommu_unmap(domain->domain, region->start, PAGE_SIZE);
> +
> + if (unmapped == PAGE_SIZE)
> + iommu_unmap(domain->domain, region->start + PAGE_SIZE, PAGE_SIZE);
> + else
> + domain->fgsp = true;
> + }
> + break;
> }
>
> __free_pages(pages, order);
> @@ -2326,7 +2334,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> }
> }
>
> - vfio_test_domain_fgsp(domain);
> + vfio_test_domain_fgsp(domain, &iova_copy);
>
> /* replay mappings on new domains */
> ret = vfio_iommu_replay(iommu, domain);

2023-01-06 18:40:22

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] vfio/type1: Respect IOMMU reserved regions in vfio_test_domain_fgsp()

On Fri, Jan 06, 2023 at 10:24:50AM -0700, Alex Williamson wrote:

> > - ret = iommu_map(domain->domain, 0, page_to_phys(pages), PAGE_SIZE * 2,
> > - IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE);
> > - if (!ret) {
> > - size_t unmapped = iommu_unmap(domain->domain, 0, PAGE_SIZE);
> > + list_for_each_entry(region, regions, list) {
> > + if (region->end - region->start < PAGE_SIZE * 2 ||
> > + region->start % (PAGE_SIZE*2))
>
> Maybe this falls into the noise, but we don't care if region->start is
> aligned to a double page, so long as we can map an aligned double page
> within the region. Maybe something like:

> dma_addr_t start = ALIGN(region->start, PAGE_SIZE * 2);
>
> if (start >= region->end || (region->end - start < PAGE_SIZE * 2))
> continue;

Yeah, that is more technically correct

Jason

2023-01-09 09:28:20

by Niklas Schnelle

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] vfio/type1: Respect IOMMU reserved regions in vfio_test_domain_fgsp()

On Fri, 2023-01-06 at 10:24 -0700, Alex Williamson wrote:
> On Wed, 4 Jan 2023 16:42:02 +0100
> Niklas Schnelle <[email protected]> wrote:
>
> > Since commit cbf7827bc5dc ("iommu/s390: Fix potential s390_domain
> > aperture shrinking") the s390 IOMMU driver uses reserved regions for the
>
> Are you asking for this in v6.2? Seems like the above was introduced
> in v6.2 and I can't tell if this is sufficiently prevalent that we need
> a fix in the same release.

If possible yes I'd hope for this to go into v6.2 so we don't break ISM
pass-through. Support for ISM pass-through has only been available
since v6.0 where Matt added the interpretation support but it is one of
the most useful pass-through uses at the moment since the ISM device
uses long lived DMA mappings and as such is pretty much unaffected by
the performance impact of our virtualized IOMMUs. Together with SMD-D
this allows high performance communication of VMs with other VMs or
LPARs. Now we don't have this problem in distribution kernels that have
Matt's patches as backports but lack my newer IOMMU changes and few if
any of our customers run upstream but still of course I'd prefer not to
have known broken upstream releases.

Thanks,
Niklas

2023-01-09 09:43:33

by Niklas Schnelle

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] vfio/type1: Respect IOMMU reserved regions in vfio_test_domain_fgsp()

On Fri, 2023-01-06 at 14:03 -0400, Jason Gunthorpe wrote:
> On Fri, Jan 06, 2023 at 10:24:50AM -0700, Alex Williamson wrote:
>
> > > - ret = iommu_map(domain->domain, 0, page_to_phys(pages), PAGE_SIZE * 2,
> > > - IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE);
> > > - if (!ret) {
> > > - size_t unmapped = iommu_unmap(domain->domain, 0, PAGE_SIZE);
> > > + list_for_each_entry(region, regions, list) {
> > > + if (region->end - region->start < PAGE_SIZE * 2 ||
> > > + region->start % (PAGE_SIZE*2))
> >
> > Maybe this falls into the noise, but we don't care if region->start is
> > aligned to a double page, so long as we can map an aligned double page
> > within the region. Maybe something like:
>
> > dma_addr_t start = ALIGN(region->start, PAGE_SIZE * 2);
> >
> > if (start >= region->end || (region->end - start < PAGE_SIZE * 2))
> > continue;
>
> Yeah, that is more technically correct
>
> Jason

Makes sense, will incorporate this into v3.

Thanks,
Niklas