This series brings the IOMMU part of HW nested paging support
in the SMMUv3. The VFIO part is submitted separately.
The IOMMU API is extended to support 3 new API functionalities:
1) pass the guest stage 1 configuration
2) pass stage 1 MSI bindings
3) invalidate stage 1 related caches
3) is also used for SVA use case [1].
Then those capabilities gets implemented in the SMMUv3 driver.
The virtualizer passes information through the VFIO user API
which cascades them to the iommu subsystem. This allows the guest
to own stage 1 tables and context descriptors (so-called PASID
table) while the host owns stage 2 tables and main configuration
structures (STE).
Best Regards
Eric
This series can be found at:
https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
References:
[1] [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support
History:
v8 -> v9:
- rebase on 5.3
- split iommu/vfio parts
v6 -> v8:
- Implement VFIO-PCI device specific interrupt framework
v7 -> v8:
- rebase on top of v5.2-rc1 and especially
8be39a1a04c1 iommu/arm-smmu-v3: Add a master->domain pointer
- dynamic alloc of s1_cfg/s2_cfg
- __arm_smmu_tlb_inv_asid/s1_range_nosync
- check there is no HW MSI regions
- asid invalidation using pasid extended struct (change in the uapi)
- add s1_live/s2_live checks
- move check about support of nested stages in domain finalise
- fixes in error reporting according to the discussion with Robin
- reordered the patches to have first iommu/smmuv3 patches and then
VFIO patches
v6 -> v7:
- removed device handle from bind/unbind_guest_msi
- added "iommu/smmuv3: Nested mode single MSI doorbell per domain
enforcement"
- added few uapi comments as suggested by Jean, Jacop and Alex
v5 -> v6:
- Fix compilation issue when CONFIG_IOMMU_API is unset
v4 -> v5:
- fix bug reported by Vincent: fault handler unregistration now happens in
vfio_pci_release
- IOMMU_FAULT_PERM_* moved outside of struct definition + small
uapi changes suggested by Kean-Philippe (except fetch_addr)
- iommu: introduce device fault report API: removed the PRI part.
- see individual logs for more details
- reset the ste abort flag on detach
v3 -> v4:
- took into account Alex, jean-Philippe and Robin's comments on v3
- rework of the smmuv3 driver integration
- add tear down ops for msi binding and PASID table binding
- fix S1 fault propagation
- put fault reporting patches at the beginning of the series following
Jean-Philippe's request
- update of the cache invalidate and fault API uapis
- VFIO fault reporting rework with 2 separate regions and one mmappable
segment for the fault queue
- moved to PATCH
v2 -> v3:
- When registering the S1 MSI binding we now store the device handle. This
addresses Robin's comment about discimination of devices beonging to
different S1 groups and using different physical MSI doorbells.
- Change the fault reporting API: use VFIO_PCI_DMA_FAULT_IRQ_INDEX to
set the eventfd and expose the faults through an mmappable fault region
v1 -> v2:
- Added the fault reporting capability
- asid properly passed on invalidation (fix assignment of multiple
devices)
- see individual change logs for more info
Eric Auger (11):
iommu: Introduce bind/unbind_guest_msi
iommu/smmuv3: Dynamically allocate s1_cfg and s2_cfg
iommu/smmuv3: Get prepared for nested stage support
iommu/smmuv3: Implement attach/detach_pasid_table
iommu/smmuv3: Introduce __arm_smmu_tlb_inv_asid/s1_range_nosync
iommu/smmuv3: Implement cache_invalidate
dma-iommu: Implement NESTED_MSI cookie
iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI
regions
iommu/smmuv3: Implement bind/unbind_guest_msi
iommu/smmuv3: Report non recoverable faults
Jacob Pan (1):
iommu: Introduce attach/detach_pasid_table API
Jean-Philippe Brucker (1):
iommu/arm-smmu-v3: Maintain a SID->device structure
Yi L Liu (1):
iommu: Introduce cache_invalidate API
drivers/iommu/arm-smmu-v3.c | 817 ++++++++++++++++++++++++++++++++----
drivers/iommu/dma-iommu.c | 139 +++++-
drivers/iommu/iommu.c | 66 +++
include/linux/dma-iommu.h | 16 +
include/linux/iommu.h | 52 +++
include/uapi/linux/iommu.h | 161 +++++++
6 files changed, 1162 insertions(+), 89 deletions(-)
--
2.20.1
On ARM, MSI are translated by the SMMU. An IOVA is allocated
for each MSI doorbell. If both the host and the guest are exposed
with SMMUs, we end up with 2 different IOVAs allocated by each.
guest allocates an IOVA (gIOVA) to map onto the guest MSI
doorbell (gDB). The Host allocates another IOVA (hIOVA) to map
onto the physical doorbell (hDB).
So we end up with 2 untied mappings:
S1 S2
gIOVA -> gDB
hIOVA -> hDB
Currently the PCI device is programmed by the host with hIOVA
as MSI doorbell. So this does not work.
This patch introduces an API to pass gIOVA/gDB to the host so
that gIOVA can be reused by the host instead of re-allocating
a new IOVA. So the goal is to create the following nested mapping:
S1 S2
gIOVA -> gDB -> hDB
and program the PCI device with gIOVA MSI doorbell.
In case we have several devices attached to this nested domain
(devices belonging to the same group), they cannot be isolated
on guest side either. So they should also end up in the same domain
on guest side. We will enforce that all the devices attached to
the host iommu domain use the same physical doorbell and similarly
a single virtual doorbell mapping gets registered (1 single
virtual doorbell is used on guest as well).
Signed-off-by: Eric Auger <[email protected]>
---
v7 -> v8:
- dummy iommu_unbind_guest_msi turned into a void function
v6 -> v7:
- remove the device handle parameter.
- Add comments saying there can only be a single MSI binding
registered per iommu_domain
v5 -> v6:
-fix compile issue when IOMMU_API is not set
v3 -> v4:
- add unbind
v2 -> v3:
- add a struct device handle
---
drivers/iommu/iommu.c | 37 +++++++++++++++++++++++++++++++++++++
include/linux/iommu.h | 20 ++++++++++++++++++++
2 files changed, 57 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0ec72ffb8efa..ad968f579baa 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1653,6 +1653,43 @@ static void __iommu_detach_device(struct iommu_domain *domain,
trace_detach_device_from_domain(dev);
}
+/**
+ * iommu_bind_guest_msi - Passes the stage1 GIOVA/GPA mapping of a
+ * virtual doorbell
+ *
+ * @domain: iommu domain the stage 1 mapping will be attached to
+ * @iova: iova allocated by the guest
+ * @gpa: guest physical address of the virtual doorbell
+ * @size: granule size used for the mapping
+ *
+ * The associated IOVA can be reused by the host to create a nested
+ * stage2 binding mapping translating into the physical doorbell used
+ * by the devices attached to the domain.
+ *
+ * All devices within the domain must share the same physical doorbell.
+ * A single MSI GIOVA/GPA mapping can be attached to an iommu_domain.
+ */
+
+int iommu_bind_guest_msi(struct iommu_domain *domain,
+ dma_addr_t giova, phys_addr_t gpa, size_t size)
+{
+ if (unlikely(!domain->ops->bind_guest_msi))
+ return -ENODEV;
+
+ return domain->ops->bind_guest_msi(domain, giova, gpa, size);
+}
+EXPORT_SYMBOL_GPL(iommu_bind_guest_msi);
+
+void iommu_unbind_guest_msi(struct iommu_domain *domain,
+ dma_addr_t iova)
+{
+ if (unlikely(!domain->ops->unbind_guest_msi))
+ return;
+
+ domain->ops->unbind_guest_msi(domain, iova);
+}
+EXPORT_SYMBOL_GPL(iommu_unbind_guest_msi);
+
void iommu_detach_device(struct iommu_domain *domain, struct device *dev)
{
struct iommu_group *group;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0314d152df08..6d7cc326e299 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -231,6 +231,8 @@ struct iommu_sva_ops {
* @attach_pasid_table: attach a pasid table
* @detach_pasid_table: detach the pasid table
* @cache_invalidate: invalidate translation caches
+ * @bind_guest_msi: provides a stage1 giova/gpa MSI doorbell mapping
+ * @unbind_guest_msi: withdraw a stage1 giova/gpa MSI doorbell mapping
* @pgsize_bitmap: bitmap of all possible supported page sizes
*/
struct iommu_ops {
@@ -300,6 +302,10 @@ struct iommu_ops {
int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
struct iommu_cache_invalidate_info *inv_info);
+ int (*bind_guest_msi)(struct iommu_domain *domain,
+ dma_addr_t giova, phys_addr_t gpa, size_t size);
+ void (*unbind_guest_msi)(struct iommu_domain *domain, dma_addr_t giova);
+
unsigned long pgsize_bitmap;
};
@@ -409,6 +415,11 @@ extern void iommu_detach_pasid_table(struct iommu_domain *domain);
extern int iommu_cache_invalidate(struct iommu_domain *domain,
struct device *dev,
struct iommu_cache_invalidate_info *inv_info);
+extern int iommu_bind_guest_msi(struct iommu_domain *domain,
+ dma_addr_t giova, phys_addr_t gpa, size_t size);
+extern void iommu_unbind_guest_msi(struct iommu_domain *domain,
+ dma_addr_t giova);
+
extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -969,6 +980,15 @@ iommu_cache_invalidate(struct iommu_domain *domain,
return -ENODEV;
}
+static inline
+int iommu_bind_guest_msi(struct iommu_domain *domain,
+ dma_addr_t giova, phys_addr_t gpa, size_t size)
+{
+ return -ENODEV;
+}
+static inline
+void iommu_unbind_guest_msi(struct iommu_domain *domain, dma_addr_t giova) {}
+
#endif /* CONFIG_IOMMU_API */
#ifdef CONFIG_IOMMU_DEBUGFS
--
2.20.1
When a stage 1 related fault event is read from the event queue,
let's propagate it to potential external fault listeners, ie. users
who registered a fault handler.
Signed-off-by: Eric Auger <[email protected]>
---
v8 -> v9:
- adapt to the removal of IOMMU_FAULT_UNRECOV_PERM_VALID:
only look at IOMMU_FAULT_UNRECOV_ADDR_VALID which comes with
perm
- do not advertise IOMMU_FAULT_UNRECOV_PASID_VALID faults for
translation faults
- trace errors if !master
- test nested before calling iommu_report_device_fault
- call the fault handler unconditionnally in non nested mode
v4 -> v5:
- s/IOMMU_FAULT_PERM_INST/IOMMU_FAULT_PERM_EXEC
---
drivers/iommu/arm-smmu-v3.c | 182 +++++++++++++++++++++++++++++++++---
1 file changed, 171 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 641f1058ef51..79229560c167 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -169,6 +169,26 @@
#define ARM_SMMU_PRIQ_IRQ_CFG1 0xd8
#define ARM_SMMU_PRIQ_IRQ_CFG2 0xdc
+/* Events */
+#define ARM_SMMU_EVT_F_UUT 0x01
+#define ARM_SMMU_EVT_C_BAD_STREAMID 0x02
+#define ARM_SMMU_EVT_F_STE_FETCH 0x03
+#define ARM_SMMU_EVT_C_BAD_STE 0x04
+#define ARM_SMMU_EVT_F_BAD_ATS_TREQ 0x05
+#define ARM_SMMU_EVT_F_STREAM_DISABLED 0x06
+#define ARM_SMMU_EVT_F_TRANSL_FORBIDDEN 0x07
+#define ARM_SMMU_EVT_C_BAD_SUBSTREAMID 0x08
+#define ARM_SMMU_EVT_F_CD_FETCH 0x09
+#define ARM_SMMU_EVT_C_BAD_CD 0x0a
+#define ARM_SMMU_EVT_F_WALK_EABT 0x0b
+#define ARM_SMMU_EVT_F_TRANSLATION 0x10
+#define ARM_SMMU_EVT_F_ADDR_SIZE 0x11
+#define ARM_SMMU_EVT_F_ACCESS 0x12
+#define ARM_SMMU_EVT_F_PERMISSION 0x13
+#define ARM_SMMU_EVT_F_TLB_CONFLICT 0x20
+#define ARM_SMMU_EVT_F_CFG_CONFLICT 0x21
+#define ARM_SMMU_EVT_E_PAGE_REQUEST 0x24
+
/* Common MSI config fields */
#define MSI_CFG0_ADDR_MASK GENMASK_ULL(51, 2)
#define MSI_CFG2_SH GENMASK(5, 4)
@@ -350,6 +370,15 @@
#define EVTQ_MAX_SZ_SHIFT (Q_MAX_SZ_SHIFT - EVTQ_ENT_SZ_SHIFT)
#define EVTQ_0_ID GENMASK_ULL(7, 0)
+#define EVTQ_0_SSV GENMASK_ULL(11, 11)
+#define EVTQ_0_SUBSTREAMID GENMASK_ULL(31, 12)
+#define EVTQ_0_STREAMID GENMASK_ULL(63, 32)
+#define EVTQ_1_PNU GENMASK_ULL(33, 33)
+#define EVTQ_1_IND GENMASK_ULL(34, 34)
+#define EVTQ_1_RNW GENMASK_ULL(35, 35)
+#define EVTQ_1_S2 GENMASK_ULL(39, 39)
+#define EVTQ_1_CLASS GENMASK_ULL(40, 41)
+#define EVTQ_3_FETCH_ADDR GENMASK_ULL(51, 3)
/* PRI queue */
#define PRIQ_ENT_SZ_SHIFT 4
@@ -655,6 +684,57 @@ struct arm_smmu_domain {
spinlock_t devices_lock;
};
+/* fault propagation */
+struct arm_smmu_fault_propagation_data {
+ enum iommu_fault_reason reason;
+ bool s1_check;
+ u32 fields; /* IOMMU_FAULT_UNRECOV_*_VALID bits */
+};
+
+/*
+ * Describes how SMMU faults translate into generic IOMMU faults
+ * and if they need to be reported externally
+ */
+static const struct arm_smmu_fault_propagation_data fault_propagation[] = {
+[ARM_SMMU_EVT_F_UUT] = { },
+[ARM_SMMU_EVT_C_BAD_STREAMID] = { },
+[ARM_SMMU_EVT_F_STE_FETCH] = { },
+[ARM_SMMU_EVT_C_BAD_STE] = { },
+[ARM_SMMU_EVT_F_BAD_ATS_TREQ] = { },
+[ARM_SMMU_EVT_F_STREAM_DISABLED] = { },
+[ARM_SMMU_EVT_F_TRANSL_FORBIDDEN] = { },
+[ARM_SMMU_EVT_C_BAD_SUBSTREAMID] = {IOMMU_FAULT_REASON_PASID_INVALID,
+ false,
+ IOMMU_FAULT_UNRECOV_PASID_VALID
+ },
+[ARM_SMMU_EVT_F_CD_FETCH] = {IOMMU_FAULT_REASON_PASID_FETCH,
+ false,
+ IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID
+ },
+[ARM_SMMU_EVT_C_BAD_CD] = {IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
+ false,
+ },
+[ARM_SMMU_EVT_F_WALK_EABT] = {IOMMU_FAULT_REASON_WALK_EABT, true,
+ IOMMU_FAULT_UNRECOV_ADDR_VALID |
+ IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID
+ },
+[ARM_SMMU_EVT_F_TRANSLATION] = {IOMMU_FAULT_REASON_PTE_FETCH, true,
+ IOMMU_FAULT_UNRECOV_ADDR_VALID
+ },
+[ARM_SMMU_EVT_F_ADDR_SIZE] = {IOMMU_FAULT_REASON_OOR_ADDRESS, true,
+ IOMMU_FAULT_UNRECOV_ADDR_VALID
+ },
+[ARM_SMMU_EVT_F_ACCESS] = {IOMMU_FAULT_REASON_ACCESS, true,
+ IOMMU_FAULT_UNRECOV_ADDR_VALID
+ },
+[ARM_SMMU_EVT_F_PERMISSION] = {IOMMU_FAULT_REASON_PERMISSION, true,
+ IOMMU_FAULT_UNRECOV_ADDR_VALID
+ },
+[ARM_SMMU_EVT_F_TLB_CONFLICT] = { },
+[ARM_SMMU_EVT_F_CFG_CONFLICT] = { },
+[ARM_SMMU_EVT_E_PAGE_REQUEST] = { },
+};
+
struct arm_smmu_option_prop {
u32 opt;
const char *prop;
@@ -1332,7 +1412,6 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
return 0;
}
-__maybe_unused
static struct arm_smmu_master *
arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
{
@@ -1358,24 +1437,105 @@ arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
return master;
}
+/* Populates the record fields according to the input SMMU event */
+static bool arm_smmu_transcode_fault(u64 *evt, u8 type,
+ struct iommu_fault_unrecoverable *record)
+{
+ const struct arm_smmu_fault_propagation_data *data;
+ u32 fields;
+
+ if (type >= ARRAY_SIZE(fault_propagation))
+ return false;
+
+ data = &fault_propagation[type];
+ if (!data->reason)
+ return false;
+
+ fields = data->fields;
+
+ if (data->s1_check & FIELD_GET(EVTQ_1_S2, evt[1]))
+ return false; /* S2 related fault, don't propagate */
+
+ if (fields & IOMMU_FAULT_UNRECOV_PASID_VALID)
+ record->pasid = FIELD_GET(EVTQ_0_SUBSTREAMID, evt[0]);
+ else {
+ /* all other transcoded errors have SSV */
+ if (FIELD_GET(EVTQ_0_SSV, evt[0])) {
+ record->pasid = FIELD_GET(EVTQ_0_SUBSTREAMID, evt[0]);
+ fields |= IOMMU_FAULT_UNRECOV_PASID_VALID;
+ }
+ }
+
+ if (fields & IOMMU_FAULT_UNRECOV_ADDR_VALID) {
+ if (FIELD_GET(EVTQ_1_RNW, evt[1]))
+ record->perm = IOMMU_FAULT_PERM_READ;
+ else
+ record->perm = IOMMU_FAULT_PERM_WRITE;
+ if (FIELD_GET(EVTQ_1_PNU, evt[1]))
+ record->perm |= IOMMU_FAULT_PERM_PRIV;
+ if (FIELD_GET(EVTQ_1_IND, evt[1]))
+ record->perm |= IOMMU_FAULT_PERM_EXEC;
+ record->addr = evt[2];
+ }
+
+ if (fields & IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID)
+ record->fetch_addr = FIELD_GET(EVTQ_3_FETCH_ADDR, evt[3]);
+
+ record->flags = fields;
+ record->reason = data->reason;
+ return true;
+}
+
+static void arm_smmu_report_event(struct arm_smmu_device *smmu, u64 *evt)
+{
+ u32 sid = FIELD_GET(EVTQ_0_STREAMID, evt[0]);
+ u8 type = FIELD_GET(EVTQ_0_ID, evt[0]);
+ struct arm_smmu_master *master;
+ struct iommu_fault_event event = {};
+ bool nested;
+ int i;
+
+ master = arm_smmu_find_master(smmu, sid);
+ if (!master || !master->domain)
+ goto out;
+
+ event.fault.type = IOMMU_FAULT_DMA_UNRECOV;
+
+ nested = (master->domain->stage == ARM_SMMU_DOMAIN_NESTED);
+
+ if (nested) {
+ if (arm_smmu_transcode_fault(evt, type, &event.fault.event)) {
+ /*
+ * Only S1 related faults should be reported to the
+ * guest and must not flood the host log.
+ * Also a fault handler should have been registered
+ * to guarantee the full nested functionality
+ */
+ WARN_ON_ONCE(iommu_report_device_fault(master->dev,
+ &event));
+ return;
+ }
+ } else {
+ iommu_report_device_fault(master->dev, &event);
+ }
+out:
+ dev_info(smmu->dev, "event 0x%02x received:\n", type);
+ for (i = 0; i < EVTQ_ENT_DWORDS; ++i) {
+ dev_info(smmu->dev, "\t0x%016llx\n",
+ (unsigned long long)evt[i]);
+ }
+}
+
/* IRQ and event handlers */
static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
{
- int i;
struct arm_smmu_device *smmu = dev;
struct arm_smmu_queue *q = &smmu->evtq.q;
u64 evt[EVTQ_ENT_DWORDS];
do {
- while (!queue_remove_raw(q, evt)) {
- u8 id = FIELD_GET(EVTQ_0_ID, evt[0]);
-
- dev_info(smmu->dev, "event 0x%02x received:\n", id);
- for (i = 0; i < ARRAY_SIZE(evt); ++i)
- dev_info(smmu->dev, "\t0x%016llx\n",
- (unsigned long long)evt[i]);
-
- }
+ while (!queue_remove_raw(q, evt))
+ arm_smmu_report_event(smmu, evt);
/*
* Not much we can do on overflow, so scream and pretend we're
--
2.20.1