Hi Will,
As a follow-up of the VFIO/IOMMU/PCI "Dual Stage SMMUv3 Status"
session, please find some further justifications about the
SMMUv3 nested stage enablement series.
In the text below, I only talk about use cases featuring
VFIO assigned devices where the physical IOMMU is actually
involved.
The virtio-iommu solution, as currently specified, is expected
to work efficiently as long as guest IOMMU mappings are static.
This hopefully actually corresponds to the DPDK use case.
The overhead of trapping on each MAP/UNMAP is then close to 0.
I see 2 main use cases where guest uses dynamic mappings:
1) native drivers using DMA ops are used on the guest
2) shared virtual address on guest.
1) can be addressed with current virtio-iommu spec. However
the performance will be very poor: it behaves as Intel IOMMU
with the driver operating with caching mode and strict mode
set (80% perf downgrade is observed versus no iommu). This use
case can be tested very easily. Dual stage implementation
should bring much better results here.
2) natural implementation for that is nested. Jean planned
to introduce extensions to the current virtio-iommu spec to
setup stage 1 config. As far as I understand this will require
the exact same SMMUv3 driver modifications I introduced in
my series. If this happens, after the specification process,
the virtio-iommu driver upgrade, the virtio-iommu QEMU device
upgrade, we will face the same problematics as the ones
encountered in my series. This use case cannot be tested
easily. There are in-flight series to support substream IDs
in the SMMU driver and SVA/ARM but none of that code is
upstream. Also I don't know if there is any PASID capable
device easily available at the moment. So during the uC you
said you would prefer this use case to be addressed first
but according to me, this brings a lot of extra complexity
and dependencies and the above series are also stalled due to
that exact same issue.
HW nested paging should satisfy all use cases including
guest static mappings. At the moment it is difficult to
run comparative benchmarks. First you may know virtio-iommu
also suffer some FW integration delays, its QEMU VFIO
integration needs to be rebased. Also I have access to
some systems that feature a dual stage SMMUv3 but I am
not sure their cache/TLB structures are dimensionned for
exercising the 2 stages (that's a chicken and egg issue:
no SW integration, no HW).
If you consider those use cases are not sufficient to
invest time now, I have no problem pausing this development.
We can re-open the topic later when actual users show up,
are interested to review and test with production HW and
workloads.
Of course if there are any people/company interested in
getting this upstream in a decent timeframe, that's the right
moment to let us know!
Thanks
Eric
References:
[1] [PATCH v9 00/11] SMMUv3 Nested Stage Setup (IOMMU part)
https://patchwork.kernel.org/cover/11039871/
[2] [PATCH v9 00/14] SMMUv3 Nested Stage Setup (VFIO part)
https://patchwork.kernel.org/cover/11039995/