[ This series is rebased on top of v6.4-rc1 merging Jason's iommu_hwpt
branch and Yi's vfio cdev v11 branch, then the replace v7 series and
the nesting v2 (candidate) series and Intel VT-d series. Note that
some of them are still getting finalized. So, there can be potential
minor API changes that would not be reflected in this series. Yet, we
can start the review at the SMMU driver specific things.
@robin, the hw_info patch still requires the errata patch that you
mentioned. Perhaps we can merge that separately or include it in v3.
Thanks! ]
Changelog
v2:
* Added arm_smmu_set_dev_data after the set_dev_data series.
* Added Jason's patch "vfio: Remove VFIO_TYPE1_NESTING_IOMMU"
* Replaced the iommu_get_unmanaged_domain() helper with Robin's patch.
* Reworked the code in arm_smmu_cmdq_build_cmd() to make NH_VA to be
a superset of NH_VAA.
* Added inline comments and a bug-report link to the patch unsetting
dst[2] and dst[3] of STE.
* Dropped the to_s2_cfg helper since only one place really needs it.
* Dropped the VMID (override) flag and s2vmid in iommu_hwpt_arm_smmuv3
structure, because it's expected for user space to use a shared S2
domain/hwpt for all devices, i.e. the VMID (allocated with the S2
domain is already unified. If there's some special case that still
needs a VMID unification, we should probably add it incrementally.
* Move the introduction of the "struct arm_smmu_domain *s2" function
parameter to the proper patch.
* Redefined "struct iommu_hwpt_arm_smmuv3" by adding ste_uptr/len and
out_event_uptr/len. Then added an arm_smmu_domain_finalise_nested()
function to read guest Stream Table Entry with a proper sanity.
* Reworked arm_smmu_cache_invalidate_user() by reading the guest CMDQ
directly, to support batching. Also, added return value feedback of
-ETIMEDOUT at CMD_SYNC, and reported CERROR_ILL errors via the CONS
in the user_data structure.
* Updated data/functions following the nesting infrastructure updates.
* Added/fixed multiple comments per v1 review inputs.
v1:
https://lore.kernel.org/all/[email protected]/
--------------------------------------------------------------------------
Hi all,
This series of patches add nested translation support for ARM SMMUv3.
Eric Auger made a huge effort previously with the VFIO uAPIs, and sent
his v16 a year ago. Now, the nested translation should follow the new
IOMMUFD uAPIs design. So, most of the key features are ported from the
privous VFIO solution, and then rebuilt on top of the IOMMUFD nesting
infrastructure.
The essential parts in the driver to support a nested translation are
->hw_info, ->domain_alloc_user and ->cache_invalidate_user ops. So this
series fundamentally adds these three functions in the SMMUv3 driver,
along with several preparations and cleanups for them.
One unique requirement for SMMUv3 nested translation support is the MSI
doorbell address translation, which is a 2-stage translation too. And,
to working with the ITS driver, an msi_cookie needs to be setup on the
kernel-managed domain, the stage-2 domain of the nesting setup. And the
same msi_cookie will be fetched, via iommu_dma_get_msi_mapping_domain(),
in the iommu core to allocate and creates IOVA mappings for MSI doorbell
page(s). However, with the nesting design, the device is attached to a
user-managed domain, the stage-1 domain. So both the setup and fetching
of the msi_cookie would not work at the level of stage-2 domain. Thus,
on both sides, the msi_cookie setup and fetching require a redirection
of the domain pointer. It's easy to do so in iommufd core, but needs a
new op in the iommu core and driver.
You can also find this series on the Github:
https://github.com/nicolinc/iommufd/commits/iommufd_nesting-v2
The kernel branch is tested with this QEMU branch:
https://github.com/nicolinc/qemu/commits/wip/iommufd_rfcv4+nesting+smmuv3-v2
Thanks!
Nicolin Chen
Eric Auger (2):
iommu/arm-smmu-v3: Unset corresponding STE fields when s2_cfg is NULL
iommu/arm-smmu-v3: Add STRTAB_STE_0_CFG_NESTED for 2-stage translation
Jason Gunthorpe (1):
vfio: Remove VFIO_TYPE1_NESTING_IOMMU
Nicolin Chen (13):
iommufd: Add nesting related data structures for ARM SMMUv3
iommufd/device: Setup MSI on kernel-managed domains
iommu/arm-smmu-v3: Add arm_smmu_hw_info
iommu/arm-smmu-v3: Add arm_smmu_set/unset_dev_user_data
iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED
iommu/arm-smmu-v3: Allow ARM_SMMU_DOMAIN_S1 stage to access s2_cfg
iommu/arm-smmu-v3: Add s1dss in struct arm_smmu_s1_cfg
iommu/arm-smmu-v3: Pass in user_cfg to arm_smmu_domain_finalise
iommu/arm-smmu-v3: Add arm_smmu_domain_alloc_user
iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED type of allocations
iommu/arm-smmu-v3: Implement arm_smmu_get_msi_mapping_domain
iommu/arm-smmu-v3: Add CMDQ_OP_TLBI_NH_VAA and CMDQ_OP_TLBI_NH_ALL
iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
Robin Murphy (1):
iommu/dma: Support MSIs through nested domains
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 461 ++++++++++++++++++--
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 11 +-
drivers/iommu/arm/arm-smmu/arm-smmu.c | 16 -
drivers/iommu/dma-iommu.c | 18 +-
drivers/iommu/iommu.c | 10 -
drivers/iommu/iommufd/device.c | 5 +-
drivers/iommu/iommufd/main.c | 1 +
drivers/iommu/iommufd/vfio_compat.c | 7 +-
drivers/vfio/vfio_iommu_type1.c | 12 +-
include/linux/iommu.h | 7 +-
include/uapi/linux/iommufd.h | 83 ++++
include/uapi/linux/vfio.h | 2 +-
12 files changed, 538 insertions(+), 95 deletions(-)
--
2.40.1
The struct iommu_hwpt_arm_smmuv3 contains the userspace Stream Table Entry
info (for ARM_SMMU_DOMAIN_S1) and an "S2" flag (for ARM_SMMU_DOMAIN_S2).
Pass in a valid user_cfg pointer, so arm_smmu_domain_finalise() can handle
both types of user domain finalizations.
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8b827247f4b9..b8c189b732ba 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -26,6 +26,7 @@
#include <linux/pci.h>
#include <linux/pci-ats.h>
#include <linux/platform_device.h>
+#include <uapi/linux/iommufd.h>
#include "arm-smmu-v3.h"
#include "../../dma-iommu.h"
@@ -2211,7 +2212,8 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
}
static int arm_smmu_domain_finalise(struct iommu_domain *domain,
- struct arm_smmu_master *master)
+ struct arm_smmu_master *master,
+ const struct iommu_hwpt_arm_smmuv3 *user_cfg)
{
int ret;
unsigned long ias, oas;
@@ -2223,12 +2225,18 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
struct io_pgtable_cfg *);
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_device *smmu = smmu_domain->smmu;
+ bool user_cfg_s2 = user_cfg && (user_cfg->flags & IOMMU_SMMUV3_FLAG_S2);
if (domain->type == IOMMU_DOMAIN_IDENTITY) {
smmu_domain->stage = ARM_SMMU_DOMAIN_BYPASS;
return 0;
}
+ if (user_cfg_s2 && !(smmu->features & ARM_SMMU_FEAT_TRANS_S2))
+ return -EINVAL;
+ if (user_cfg_s2)
+ smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
+
/* Restrict the stage to what we can actually support */
if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
@@ -2472,7 +2480,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
if (!smmu_domain->smmu) {
smmu_domain->smmu = smmu;
- ret = arm_smmu_domain_finalise(domain, master);
+ ret = arm_smmu_domain_finalise(domain, master, NULL);
if (ret) {
smmu_domain->smmu = NULL;
goto out_unlock;
--
2.40.1
The arm_smmu_domain_alloc_user callback function is used for userspace to
allocate iommu_domains, such as standalone stage-1 domain, nested stage-1
domain, and nested stage-2 domain. The input user_data is in the type of
struct iommu_hwpt_arm_smmuv3 that contains the configurations of a nested
stage-1 or a nested stage-2 iommu_domain. A NULL user_data will just opt
in a standalone stage-1 domain allocation.
Add a constitutive function __arm_smmu_domain_alloc to support that.
Since ops->domain_alloc_user has a valid dev pointer, the master pointer
is available when calling __arm_smmu_domain_alloc() in this case, meaning
that arm_smmu_domain_finalise() can be done at the allocation stage. This
allows IOMMUFD to initialize the hw_pagetable for the domain.
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 117 +++++++++++++++-----
1 file changed, 87 insertions(+), 30 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index b8c189b732ba..75ee928c2390 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2051,36 +2051,6 @@ static void *arm_smmu_hw_info(struct device *dev, u32 *length)
return info;
}
-static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
-{
- struct arm_smmu_domain *smmu_domain;
-
- if (type == IOMMU_DOMAIN_SVA)
- return arm_smmu_sva_domain_alloc();
-
- if (type != IOMMU_DOMAIN_UNMANAGED &&
- type != IOMMU_DOMAIN_DMA &&
- type != IOMMU_DOMAIN_DMA_FQ &&
- type != IOMMU_DOMAIN_IDENTITY)
- return NULL;
-
- /*
- * Allocate the domain and initialise some of its data structures.
- * We can't really do anything meaningful until we've added a
- * master.
- */
- smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
- if (!smmu_domain)
- return NULL;
-
- mutex_init(&smmu_domain->init_mutex);
- INIT_LIST_HEAD(&smmu_domain->devices);
- spin_lock_init(&smmu_domain->devices_lock);
- INIT_LIST_HEAD(&smmu_domain->mmu_notifiers);
-
- return &smmu_domain->domain;
-}
-
static int arm_smmu_bitmap_alloc(unsigned long *map, int span)
{
int idx, size = 1 << span;
@@ -2928,10 +2898,97 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
arm_smmu_sva_remove_dev_pasid(domain, dev, pasid);
}
+/**
+ * __arm_smmu_domain_alloc - Allocate a customizable iommu_domain
+ * @type: Type of the new iommu_domain, in form of IOMMU_DOMAIN_*
+ * @master: Optional master pointer for the allocation. If given, this will be
+ * used to call arm_smmu_domain_finalise at the end of the allocation.
+ * Otherwise, arm_smmu_domain_finalise will be done when the domain is
+ * attached to a device.
+ * @user_cfg: Optional user configuration for the allocation. This allows the
+ * caller, mainly user space, to customize the iommu_domain in the
+ * arm_smmu_domain_finalise function that decodes the user_cfg data.
+ *
+ * This helper function is shared by domain_alloc_user() and domain_alloc().
+ * Unlike ops->domain_alloc has been so far only called by the iommu core that
+ * sets default values of domain->type, domain->ops and domain->pgsize_bitmap,
+ * the ops->domain_alloc_user could be directly called by the iommufd core. So,
+ * this function should set those default values for an ops->domain_alloc_user
+ * call. Note that domain->pgsize_bitmap is set in arm_smmu_domain_finalise().
+ */
+static struct iommu_domain *
+__arm_smmu_domain_alloc(unsigned type,
+ struct arm_smmu_master *master,
+ const struct iommu_hwpt_arm_smmuv3 *user_cfg)
+{
+ struct arm_smmu_domain *smmu_domain;
+ struct iommu_domain *domain;
+ int ret = 0;
+
+ if (type == IOMMU_DOMAIN_SVA)
+ return arm_smmu_sva_domain_alloc();
+
+ if (type != IOMMU_DOMAIN_UNMANAGED &&
+ type != IOMMU_DOMAIN_DMA &&
+ type != IOMMU_DOMAIN_DMA_FQ &&
+ type != IOMMU_DOMAIN_IDENTITY)
+ return NULL;
+
+ /*
+ * Allocate the domain and initialise some of its data structures.
+ * We can't really finalise the domain unless a master is given.
+ */
+ smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
+ if (!smmu_domain)
+ return NULL;
+ domain = &smmu_domain->domain;
+
+ domain->type = type;
+ domain->ops = arm_smmu_ops.default_domain_ops;
+
+ mutex_init(&smmu_domain->init_mutex);
+ INIT_LIST_HEAD(&smmu_domain->devices);
+ spin_lock_init(&smmu_domain->devices_lock);
+ INIT_LIST_HEAD(&smmu_domain->mmu_notifiers);
+
+ if (master) {
+ smmu_domain->smmu = master->smmu;
+ ret = arm_smmu_domain_finalise(domain, master, user_cfg);
+ if (ret) {
+ kfree(smmu_domain);
+ return NULL;
+ }
+ }
+
+ return domain;
+}
+
+static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
+{
+ return __arm_smmu_domain_alloc(type, NULL, NULL);
+}
+
+static struct iommu_domain *
+arm_smmu_domain_alloc_user(struct device *dev, struct iommu_domain *parent,
+ const void *user_data)
+{
+ const struct iommu_hwpt_arm_smmuv3 *user_cfg = user_data;
+ struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+ unsigned type = IOMMU_DOMAIN_UNMANAGED;
+
+ return __arm_smmu_domain_alloc(type, master, user_cfg);
+}
+
+static const size_t arm_smmu_domain_user_data_len[] = {
+ [IOMMU_HWPT_TYPE_ARM_SMMUV3] = sizeof(struct iommu_hwpt_arm_smmuv3),
+};
+
static struct iommu_ops arm_smmu_ops = {
.capable = arm_smmu_capable,
.hw_info = arm_smmu_hw_info,
.domain_alloc = arm_smmu_domain_alloc,
+ .domain_alloc_user = arm_smmu_domain_alloc_user,
+ .domain_alloc_user_data_len = arm_smmu_domain_user_data_len,
.probe_device = arm_smmu_probe_device,
.release_device = arm_smmu_release_device,
.set_dev_user_data = arm_smmu_set_dev_user_data,
--
2.40.1
Add domain allocation support for IOMMU_DOMAIN_NESTED type. This includes
the "finalise" part to log in the user space Stream Table Entry info.
Co-developed-by: Eric Auger <[email protected]>
Signed-off-by: Eric Auger <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 112 +++++++++++++++++++-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 +
2 files changed, 110 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 75ee928c2390..3d150924581f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2181,6 +2181,76 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
return 0;
}
+static int
+arm_smmu_domain_finalise_nested(struct iommu_domain *domain,
+ struct arm_smmu_master *master,
+ const struct iommu_hwpt_arm_smmuv3 *user_cfg)
+{
+ struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+ struct arm_smmu_device *smmu = smmu_domain->smmu;
+ bool feat_2lvl_cdtab = smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB;
+ bool feat_s1 = smmu->features & ARM_SMMU_FEAT_TRANS_S1;
+ bool feat_s2 = smmu->features & ARM_SMMU_FEAT_TRANS_S2;
+ size_t event_len = EVTQ_ENT_DWORDS * sizeof(u64);
+ size_t ste_len = STRTAB_STE_DWORDS * sizeof(u64);
+ struct device *dev = master->dev;
+ void __user *event_user = NULL;
+ u64 event[EVTQ_ENT_DWORDS];
+ u64 ste[STRTAB_STE_DWORDS];
+ u8 s1dss, s1fmt, s1cdmax;
+ u64 s1ctxptr;
+
+ if (user_cfg->out_event_uptr && user_cfg->event_len == event_len)
+ event_user = u64_to_user_ptr(user_cfg->out_event_uptr);
+ event[0] = FIELD_PREP(EVTQ_0_ID, EVT_ID_BAD_STE);
+
+ if (!feat_s1 || !feat_s2) {
+ dev_dbg(dev, "does not implement two stages\n");
+ if (event_user && copy_to_user(event_user, event, event_len))
+ return -EFAULT;
+ return -EINVAL;
+ }
+
+ if (!user_cfg->ste_uptr || user_cfg->ste_len != ste_len)
+ return -EINVAL;
+ if (copy_from_user(ste, u64_to_user_ptr(user_cfg->ste_uptr), ste_len))
+ return -EFAULT;
+
+ s1dss = FIELD_GET(STRTAB_STE_1_S1DSS, ste[1]);
+
+ s1fmt = FIELD_GET(STRTAB_STE_0_S1FMT, ste[0]);
+ if (!feat_2lvl_cdtab && s1fmt != STRTAB_STE_0_S1FMT_LINEAR) {
+ dev_dbg(dev, "unsupported format (0x%x)\n", s1fmt);
+ if (event_user && copy_to_user(event_user, event, event_len))
+ return -EFAULT;
+ return -EINVAL;
+ }
+
+ s1cdmax = FIELD_GET(STRTAB_STE_0_S1CDMAX, ste[0]);
+ if (s1cdmax > master->ssid_bits) {
+ dev_dbg(dev, "s1cdmax (%d-bit) is out of range (%d-bit)\n",
+ s1cdmax, master->ssid_bits);
+ if (event_user && copy_to_user(event_user, event, event_len))
+ return -EFAULT;
+ return -EINVAL;
+ }
+
+ s1ctxptr = ste[0] & STRTAB_STE_0_S1CTXPTR_MASK;
+ if (s1ctxptr & ~GENMASK_ULL(smmu->ias, 0)) {
+ dev_dbg(dev, "s1ctxptr (0x%llx) is out of range (%lu-bit)\n",
+ s1ctxptr, smmu->ias);
+ if (event_user && copy_to_user(event_user, event, event_len))
+ return -EFAULT;
+ return -EINVAL;
+ }
+
+ smmu_domain->s1_cfg.s1dss = s1dss;
+ smmu_domain->s1_cfg.s1fmt = s1fmt;
+ smmu_domain->s1_cfg.s1cdmax = s1cdmax;
+ smmu_domain->s1_cfg.cdcfg.cdtab_dma = s1ctxptr;
+ return 0;
+}
+
static int arm_smmu_domain_finalise(struct iommu_domain *domain,
struct arm_smmu_master *master,
const struct iommu_hwpt_arm_smmuv3 *user_cfg)
@@ -2202,6 +2272,11 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
return 0;
}
+ if (domain->type == IOMMU_DOMAIN_NESTED) {
+ smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
+ return arm_smmu_domain_finalise_nested(domain, master, user_cfg);
+ }
+
if (user_cfg_s2 && !(smmu->features & ARM_SMMU_FEAT_TRANS_S2))
return -EINVAL;
if (user_cfg_s2)
@@ -2898,9 +2973,16 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
arm_smmu_sva_remove_dev_pasid(domain, dev, pasid);
}
+static const struct iommu_domain_ops arm_smmu_nested_domain_ops = {
+ .attach_dev = arm_smmu_attach_dev,
+ .free = arm_smmu_domain_free,
+};
+
/**
* __arm_smmu_domain_alloc - Allocate a customizable iommu_domain
* @type: Type of the new iommu_domain, in form of IOMMU_DOMAIN_*
+ * @s2: Optional pointer to an stage-2 domain, used by an stage-1 nested domain
+ * allocation, pairing with a valid user_cfg data to configure the domain.
* @master: Optional master pointer for the allocation. If given, this will be
* used to call arm_smmu_domain_finalise at the end of the allocation.
* Otherwise, arm_smmu_domain_finalise will be done when the domain is
@@ -2918,6 +3000,7 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
*/
static struct iommu_domain *
__arm_smmu_domain_alloc(unsigned type,
+ struct arm_smmu_domain *s2,
struct arm_smmu_master *master,
const struct iommu_hwpt_arm_smmuv3 *user_cfg)
{
@@ -2929,11 +3012,15 @@ __arm_smmu_domain_alloc(unsigned type,
return arm_smmu_sva_domain_alloc();
if (type != IOMMU_DOMAIN_UNMANAGED &&
+ type != IOMMU_DOMAIN_NESTED &&
type != IOMMU_DOMAIN_DMA &&
type != IOMMU_DOMAIN_DMA_FQ &&
type != IOMMU_DOMAIN_IDENTITY)
return NULL;
+ if (s2 && (s2->stage != ARM_SMMU_DOMAIN_S2 || !user_cfg))
+ return NULL;
+
/*
* Allocate the domain and initialise some of its data structures.
* We can't really finalise the domain unless a master is given.
@@ -2941,10 +3028,14 @@ __arm_smmu_domain_alloc(unsigned type,
smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
if (!smmu_domain)
return NULL;
+ smmu_domain->s2 = s2;
domain = &smmu_domain->domain;
domain->type = type;
- domain->ops = arm_smmu_ops.default_domain_ops;
+ if (s2)
+ domain->ops = &arm_smmu_nested_domain_ops;
+ else
+ domain->ops = arm_smmu_ops.default_domain_ops;
mutex_init(&smmu_domain->init_mutex);
INIT_LIST_HEAD(&smmu_domain->devices);
@@ -2965,7 +3056,7 @@ __arm_smmu_domain_alloc(unsigned type,
static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
{
- return __arm_smmu_domain_alloc(type, NULL, NULL);
+ return __arm_smmu_domain_alloc(type, NULL, NULL, NULL);
}
static struct iommu_domain *
@@ -2975,8 +3066,23 @@ arm_smmu_domain_alloc_user(struct device *dev, struct iommu_domain *parent,
const struct iommu_hwpt_arm_smmuv3 *user_cfg = user_data;
struct arm_smmu_master *master = dev_iommu_priv_get(dev);
unsigned type = IOMMU_DOMAIN_UNMANAGED;
+ struct arm_smmu_domain *s2 = NULL;
+
+ /*
+ * The type of the new domain stays at IOMMU_DOMAIN_UNMANAGED, unless a
+ * valid parent domain is given, turning it to be IOMMU_DOMAIN_NESTED.
+ * The "stage" of an IOMMU_DOMAIN_UNMANAGED domain, however, is decided
+ * in the arm_smmu_domain_finalise function that reads user_cfg->flags,
+ * to set the stage accordingly.
+ */
+ if (parent) {
+ if (parent->ops != arm_smmu_ops.default_domain_ops)
+ return NULL;
+ type = IOMMU_DOMAIN_NESTED;
+ s2 = to_smmu_domain(parent);
+ }
- return __arm_smmu_domain_alloc(type, master, user_cfg);
+ return __arm_smmu_domain_alloc(type, s2, master, user_cfg);
}
static const size_t arm_smmu_domain_user_data_len[] = {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 6ed86883d52e..9540926b4598 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -380,6 +380,7 @@
#define EVTQ_0_ID GENMASK_ULL(7, 0)
+#define EVT_ID_BAD_STE 0x4
#define EVT_ID_TRANSLATION_FAULT 0x10
#define EVT_ID_ADDR_SIZE_FAULT 0x11
#define EVT_ID_ACCESS_FAULT 0x12
--
2.40.1
With a nested translation setup, a stage-1 Context Descriptor table can be
managed by a guest OS in the user space. So, the kernel driver should not
assume that the guest OS will use a user space device driver that doesn't
support TLBI_NH_VAA and TLBI_NH_ALL commands.
Add them in the arm_smmu_cmdq_build_cmd(), to prepare for support of these
two TLBI invalidation requests from the guest level.
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 12 +++++++++++-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 ++
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 5fe7fed825d0..3902dd8cfcfa 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -272,8 +272,17 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
break;
case CMDQ_OP_TLBI_NH_VA:
- cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
+ cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
fallthrough;
+ case CMDQ_OP_TLBI_NH_VAA:
+ cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_NUM, ent->tlbi.num);
+ cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_SCALE, ent->tlbi.scale);
+ cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
+ cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf);
+ cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TTL, ent->tlbi.ttl);
+ cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TG, ent->tlbi.tg);
+ cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK;
+ break;
case CMDQ_OP_TLBI_EL2_VA:
cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_NUM, ent->tlbi.num);
cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_SCALE, ent->tlbi.scale);
@@ -295,6 +304,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
case CMDQ_OP_TLBI_NH_ASID:
cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
fallthrough;
+ case CMDQ_OP_TLBI_NH_ALL:
case CMDQ_OP_TLBI_S12_VMALL:
cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
break;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 9540926b4598..43c971af663c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -455,8 +455,10 @@ struct arm_smmu_cmdq_ent {
};
} cfgi;
+ #define CMDQ_OP_TLBI_NH_ALL 0x10
#define CMDQ_OP_TLBI_NH_ASID 0x11
#define CMDQ_OP_TLBI_NH_VA 0x12
+ #define CMDQ_OP_TLBI_NH_VAA 0x13
#define CMDQ_OP_TLBI_EL2_ALL 0x20
#define CMDQ_OP_TLBI_EL2_ASID 0x21
#define CMDQ_OP_TLBI_EL2_VA 0x22
--
2.40.1
In a nested translation setup, the device is attached to a stage-1 domain
that represents the guest-level Context Descriptor table. A Stream Table
Entry for a 2-stage translation needs both the stage-1 Context Descriptor
table info and the stage-2 Translation table information, i.e. a pair of
s1_cfg and s2_cfg.
Add an "s2" pointer in struct arm_smmu_domain, so a nested stage-1 domain
can simply navigate its stage-2 domain for the s2_cfg pointer, upon the
availability of a domain->s2 pointer.
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 ++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 +
2 files changed, 3 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 74e38abf5f4c..fd3ac4802907 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1289,6 +1289,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
switch (smmu_domain->stage) {
case ARM_SMMU_DOMAIN_S1:
s1_cfg = &smmu_domain->s1_cfg;
+ if (smmu_domain->s2)
+ s2_cfg = &smmu_domain->s2->s2_cfg;
break;
case ARM_SMMU_DOMAIN_S2:
s2_cfg = &smmu_domain->s2_cfg;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index e73707479119..232a2dda5d24 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -711,6 +711,7 @@ enum arm_smmu_domain_stage {
};
struct arm_smmu_domain {
+ struct arm_smmu_domain *s2;
struct arm_smmu_device *smmu;
struct mutex init_mutex; /* Protects smmu pointer */
--
2.40.1
From: Eric Auger <[email protected]>
The value of the STRTAB_STE_0_CFG field can be 0b111 as the configuration
for a 2-stage translation, meaning that both S1 and S2 are valid. Add it
and mark the ste_live accordingly.
Signed-off-by: Eric Auger <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 +
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 +
2 files changed, 2 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 792e8a788e2e..74e38abf5f4c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1304,6 +1304,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
break;
case STRTAB_STE_0_CFG_S1_TRANS:
case STRTAB_STE_0_CFG_S2_TRANS:
+ case STRTAB_STE_0_CFG_NESTED:
ste_live = true;
break;
case STRTAB_STE_0_CFG_ABORT:
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index e9e45ce7a899..e73707479119 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -208,6 +208,7 @@
#define STRTAB_STE_0_CFG_BYPASS 4
#define STRTAB_STE_0_CFG_S1_TRANS 5
#define STRTAB_STE_0_CFG_S2_TRANS 6
+#define STRTAB_STE_0_CFG_NESTED 7
#define STRTAB_STE_0_S1FMT GENMASK_ULL(5, 4)
#define STRTAB_STE_0_S1FMT_LINEAR 0
--
2.40.1
IOMMUFD designs two iommu_domain pointers to represent two stages. The S1
iommu_domain (IOMMU_DOMAIN_NESTED type) represents the Context Descriptor
table in the user space. The S2 iommu_domain (IOMMU_DOMAIN_UNMANAGED type)
represents the translation table in the kernel, owned by a hypervisor.
So there comes to no use case of the ARM_SMMU_DOMAIN_NESTED. Drop it, and
use the type IOMMU_DOMAIN_NESTED instead.
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 --
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 -
2 files changed, 3 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 327819663547..c57c70132c0b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1291,7 +1291,6 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
s1_cfg = &smmu_domain->s1_cfg;
break;
case ARM_SMMU_DOMAIN_S2:
- case ARM_SMMU_DOMAIN_NESTED:
s2_cfg = &smmu_domain->s2_cfg;
break;
default:
@@ -2229,7 +2228,6 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
fmt = ARM_64_LPAE_S1;
finalise_stage_fn = arm_smmu_domain_finalise_s1;
break;
- case ARM_SMMU_DOMAIN_NESTED:
case ARM_SMMU_DOMAIN_S2:
ias = smmu->ias;
oas = smmu->oas;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 1951a80af241..e9e45ce7a899 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -706,7 +706,6 @@ struct arm_smmu_master {
enum arm_smmu_domain_stage {
ARM_SMMU_DOMAIN_S1 = 0,
ARM_SMMU_DOMAIN_S2,
- ARM_SMMU_DOMAIN_NESTED,
ARM_SMMU_DOMAIN_BYPASS,
};
--
2.40.1
The s1dss has been so far hardcoded with a value STRTAB_STE_1_S1DSS_SSID0
in the STE setup routine. Yet, with a nested STE support, it should setup
the value coming from the guest OS.
Add s1dss in struct arm_smmu_s1_cfg, and set to STRTAB_STE_1_S1DSS_SSID0
in arm_smmu_domain_finalise_s1(). It will be setup with the value from a
guest OS in a separate pathway for a nested use case.
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 ++-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index fd3ac4802907..8b827247f4b9 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1346,7 +1346,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
BUG_ON(ste_live);
dst[1] = cpu_to_le64(
- FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
+ FIELD_PREP(STRTAB_STE_1_S1DSS, s1_cfg->s1dss) |
FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) |
FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) |
@@ -2144,6 +2144,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
goto out_unlock;
cfg->s1cdmax = master->ssid_bits;
+ cfg->s1dss = STRTAB_STE_1_S1DSS_SSID0;
smmu_domain->stall_enabled = master->stall_enabled;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 232a2dda5d24..6ed86883d52e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -598,6 +598,7 @@ struct arm_smmu_s1_cfg {
struct arm_smmu_ctx_desc_cfg cdcfg;
struct arm_smmu_ctx_desc cd;
u8 s1fmt;
+ u8 s1dss;
u8 s1cdmax;
};
--
2.40.1
Add the following data structures for corresponding ioctls:
iommu_hwpt_arm_smmuv3 => IOMMUFD_CMD_HWPT_ALLOC
iommu_hwpt_invalidate_arm_smmuv3 => IOMMUFD_CMD_HWPT_INVALIDATE
The iommu_hwpt_arm_smmuv3 is a data structure to allocate an SMMU specific
translation table, either a stage-1 Context Descriptor table or a stage-2
translation table, which would need the user Stream Table Entry or just a
simple S2 flag respectively.
The iommu_hwpt_invalidate_arm_smmuv3 is a data structure for a request of
stage-1 cache invalidations that could be for TLB invalidations, Context
Descriptor invalidations, and Address Translation Cache invalidations. The
data structure contains all the necessary information about a user-space
Command Queue, so the kernel can handle all invalidation commands with a
single batch.
Also, add IOMMU_HWPT_TYPE_ARM_SMMUV3 and IOMMU_HW_INFO_TYPE_ARM_SMMUV3.
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/iommu/iommufd/main.c | 1 +
include/uapi/linux/iommufd.h | 59 ++++++++++++++++++++++++++++++++++++
2 files changed, 60 insertions(+)
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 7d7c11c1c912..8461b12c43b7 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -288,6 +288,7 @@ union ucmd_buffer {
*/
struct iommu_hwpt_invalidate_intel_vtd vtd;
struct iommu_hwpt_invalidate_request_intel_vtd req_vtd;
+ struct iommu_hwpt_invalidate_arm_smmuv3 smmuv3;
};
struct iommufd_ioctl_op {
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index c8f5801d3439..6845ce6e1e76 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -354,10 +354,12 @@ struct iommu_vfio_ioas {
* enum iommu_hwpt_type - IOMMU HWPT Type
* @IOMMU_HWPT_TYPE_DEFAULT: default
* @IOMMU_HWPT_TYPE_VTD_S1: Intel VT-d stage-1 page table
+ * @IOMMU_HWPT_TYPE_ARM_SMMUV3: ARM SMMUv3 Translation table
*/
enum iommu_hwpt_type {
IOMMU_HWPT_TYPE_DEFAULT,
IOMMU_HWPT_TYPE_VTD_S1,
+ IOMMU_HWPT_TYPE_ARM_SMMUV3,
};
/**
@@ -413,6 +415,32 @@ struct iommu_hwpt_intel_vtd {
__u32 __reserved;
};
+/**
+ * struct iommu_hwpt_arm_smmuv3 - ARM SMMUv3 specific translation table data
+ *
+ * @flags: Translation table entry attributes
+ * @ste_len: Length of the user Stream Table Entry
+ * @ste_uptr: User pointer to a user Stream Table Entry
+ * @event_len: Length of the returning event
+ * @out_event_uptr: User pointer to a returning event, to report a C_BAD_STE upon
+ * an STE configuration failure
+ *
+ * If event_len or out_event_uptr is unset, remainning at value 0, an STE
+ * configuration failure during the hwpt allocation will not be reported.
+ *
+ * The data structure can be used to allocate a stage-2 translation table, in
+ * which case only IOMMU_SMMUV3_FLAG_S2 would matter, i.e. all other ste_* and
+ * envent_* inputs would be ignored.
+ */
+struct iommu_hwpt_arm_smmuv3 {
+#define IOMMU_SMMUV3_FLAG_S2 (1 << 0) /* if unset, stage-1 */
+ __u64 flags;
+ __u64 ste_len;
+ __aligned_u64 ste_uptr;
+ __u64 event_len;
+ __aligned_u64 out_event_uptr;
+};
+
/**
* struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC)
* @size: sizeof(struct iommu_hwpt_alloc)
@@ -448,6 +476,8 @@ struct iommu_hwpt_intel_vtd {
* +------------------------------+-------------------------------------+-----------+
* | IOMMU_HWPT_TYPE_VTD_S1 | struct iommu_hwpt_intel_vtd | HWPT |
* +------------------------------+-------------------------------------+-----------+
+ * | IOMMU_HWPT_TYPE_ARM_SMMUV3 | struct iommu_hwpt_arm_smmuv3 | IOAS/HWPT |
+ * +------------------------------+-------------------------------------------------+
*/
struct iommu_hwpt_alloc {
__u32 size;
@@ -465,10 +495,12 @@ struct iommu_hwpt_alloc {
/**
* enum iommu_hw_info_type - IOMMU Hardware Info Types
* @IOMMU_HW_INFO_TYPE_INTEL_VTD: Intel VT-d iommu info type
+ * @IOMMU_HW_INFO_TYPE_ARM_SMMUV3: ARM SMMUv3 iommu info type
*/
enum iommu_hw_info_type {
IOMMU_HW_INFO_TYPE_NONE,
IOMMU_HW_INFO_TYPE_INTEL_VTD,
+ IOMMU_HW_INFO_TYPE_ARM_SMMUV3,
};
/**
@@ -603,6 +635,31 @@ struct iommu_hwpt_invalidate_intel_vtd {
__u64 inv_data_uptr;
};
+/**
+ * struct iommu_hwpt_invalidate_arm_smmuv3 - ARM SMMUv3 cahce invalidation info
+ * @cmdq_uptr: User pointer to a user command queue
+ * @cmdq_cons_uptr: User pointer to the consumer index of a user command queue,
+ * allowing kernel to read and also update the consumer index
+ * for a successful call or a failure with a CERROR_ILL code.
+ * This pointer must point to a __u32 type of memory location.
+ * @cmdq_prod: Producer index of user command queues
+ * @cmdq_entry_size: Entry size of a user command queue
+ * @cmdq_log2size: Queue size as log2(entries). Refer to 6.3.25 SMMU_CMDQ_BASE
+ * @__reserved: Must be 0
+ *
+ * Both the consumer index and the producer index should be in their raw forms,
+ * i.e. the raw values out of the SMMU_CMDQ_PROD and SMMU_CMDQ_CONS registers,
+ * which include the WRAP bits also instead of simply the two index values.
+ */
+struct iommu_hwpt_invalidate_arm_smmuv3 {
+ __aligned_u64 cmdq_uptr;
+ __aligned_u64 cmdq_cons_uptr;
+ __u32 cmdq_prod;
+ __u32 cmdq_entry_size;
+ __u32 cmdq_log2size;
+ __u32 __reserved;
+};
+
/**
* struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
* @size: sizeof(struct iommu_hwpt_invalidate)
@@ -620,6 +677,8 @@ struct iommu_hwpt_invalidate_intel_vtd {
* +------------------------------+----------------------------------------+
* | IOMMU_HWPT_TYPE_VTD_S1 | struct iommu_hwpt_invalidate_intel_vtd |
* +------------------------------+----------------------------------------+
+ * | IOMMU_HWPT_TYPE_ARM_SMMUV3 | struct iommu_hwpt_invalidate_arm_smmuv3|
+ * +------------------------------+----------------------------------------+
*/
struct iommu_hwpt_invalidate {
__u32 size;
--
2.40.1
The IOMMU_RESV_SW_MSI is a kernel-managed domain thing. So, it should be
only setup on a kernel-managed domain only. If the attaching domain is a
user-managed domain, redirect the hwpt to hwpt->parent to do it correctly.
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/iommu/iommufd/device.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index e40f576fdf0e..e403cf6c336d 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -351,7 +351,8 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup,
* call iommu_get_msi_cookie() on its behalf. This is necessary to setup
* the MSI window so iommu_dma_prepare_msi() can install pages into our
* domain after request_irq(). If it is not done interrupts will not
- * work on this domain.
+ * work on this domain. The msi_cookie should be always set into the
+ * kernel-managed (parent) domain.
*
* FIXME: This is conceptually broken for iommufd since we want to allow
* userspace to change the domains, eg switch from an identity IOAS to a
@@ -359,6 +360,8 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup,
* matches what the IRQ layer actually expects in a newly created
* domain.
*/
+ if (hwpt->parent)
+ hwpt = hwpt->parent;
if (sw_msi_start != PHYS_ADDR_MAX && !hwpt->msi_cookie) {
rc = iommu_get_msi_cookie(hwpt->domain, sw_msi_start);
if (rc)
--
2.40.1
In a 1-stage translation setup, a device is attached to an iommu_domain
(ARM_SMMU_DOMAIN_S1) that is a kernel-managed domain.
In a 2-stage translation setup, a device is attached to an iommu_domain
(ARM_SMMU_DOMAIN_S1) that is IOMMU_DOMAIN_NESTED type, which must have
a valid "s2" pointer for an iommu_domain (ARM_SMMU_DOMAIN_S2) that is
a kernel-managed domain.
The MSI mappings for dma-iommu core to use are located in a domain that
must be a kernel-managed one. Thus, add a function to return the correct
iommu_domain pointer accordingly.
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 3d150924581f..5fe7fed825d0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2095,6 +2095,17 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
kfree(smmu_domain);
}
+static struct iommu_domain *
+arm_smmu_get_msi_mapping_domain(struct iommu_domain *domain)
+{
+ struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+ if (smmu_domain->s2)
+ return &smmu_domain->s2->domain;
+
+ return domain;
+}
+
static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
struct arm_smmu_master *master,
struct io_pgtable_cfg *pgtbl_cfg)
@@ -2976,6 +2987,7 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
static const struct iommu_domain_ops arm_smmu_nested_domain_ops = {
.attach_dev = arm_smmu_attach_dev,
.free = arm_smmu_domain_free,
+ .get_msi_mapping_domain = arm_smmu_get_msi_mapping_domain,
};
/**
--
2.40.1
This is used to forward the host IDR values to the user space, so the
hypervisor and the guest VM can learn about the underlying hardware's
capabilities.
Also, set the ops->hw_info_type and ops->hwpt_type_bitmap accordingly.
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 26 +++++++++++++++++++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 ++
include/uapi/linux/iommufd.h | 14 +++++++++++
3 files changed, 42 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 2c53849cae30..89daf50be87b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2014,6 +2014,29 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
}
}
+static void *arm_smmu_hw_info(struct device *dev, u32 *length)
+{
+ struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+ struct iommu_hw_info_smmuv3 *info;
+ void *base_idr;
+ int i;
+
+ if (!master || !master->smmu)
+ return ERR_PTR(-ENODEV);
+
+ info = kzalloc(sizeof(*info), GFP_KERNEL);
+ if (!info)
+ return ERR_PTR(-ENOMEM);
+
+ base_idr = master->smmu->base + ARM_SMMU_IDR0;
+ for (i = 0; i <= 5; i++)
+ info->idr[i] = readl_relaxed(base_idr + 0x4 * i);
+
+ *length = sizeof(*info);
+
+ return info;
+}
+
static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
{
struct arm_smmu_domain *smmu_domain;
@@ -2846,6 +2869,7 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
static struct iommu_ops arm_smmu_ops = {
.capable = arm_smmu_capable,
+ .hw_info = arm_smmu_hw_info,
.domain_alloc = arm_smmu_domain_alloc,
.probe_device = arm_smmu_probe_device,
.release_device = arm_smmu_release_device,
@@ -2857,6 +2881,8 @@ static struct iommu_ops arm_smmu_ops = {
.dev_disable_feat = arm_smmu_dev_disable_feature,
.page_response = arm_smmu_page_response,
.def_domain_type = arm_smmu_def_domain_type,
+ .hw_info_type = IOMMU_HW_INFO_TYPE_ARM_SMMUV3,
+ .hwpt_type_bitmap = BIT_ULL(IOMMU_HWPT_TYPE_ARM_SMMUV3),
.pgsize_bitmap = -1UL, /* Restricted during device attach */
.owner = THIS_MODULE,
.default_domain_ops = &(const struct iommu_domain_ops) {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index b574c58a3487..5fa010c6fe20 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -14,6 +14,8 @@
#include <linux/mmzone.h>
#include <linux/sizes.h>
+#include <uapi/linux/iommufd.h>
+
/* MMIO registers */
#define ARM_SMMU_IDR0 0x0
#define IDR0_ST_LVL GENMASK(28, 27)
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 6845ce6e1e76..b1fe4c59ab82 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -534,6 +534,20 @@ struct iommu_hw_info_vtd {
__aligned_u64 ecap_reg;
};
+/**
+ * struct iommu_hw_info_smmuv3 - ARM SMMUv3 device info
+ *
+ * @flags: Must be set to 0
+ * @__reserved: Must be 0
+ * @idr: Implemented features for the SMMU Non-secure programming interface.
+ * Please refer to the chapters from 6.3.1 to 6.3.6 in the SMMUv3 Spec.
+ */
+struct iommu_hw_info_smmuv3 {
+ __u32 flags;
+ __u32 __reserved;
+ __u32 idr[6];
+};
+
/**
* struct iommu_hw_info - ioctl(IOMMU_DEVICE_GET_HW_INFO)
* @size: sizeof(struct iommu_hw_info)
--
2.40.1
Implement the new set_dev_user_data and unset_dev_user_data ops, by using
an xarray to store the stream pointer indexed by a given user Stream ID.
This will be used by the user cache invalidation hypercall, to check the
SID field of an ATC_INV command and replace it with the physical SID.
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 44 +++++++++++++++++++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +
include/uapi/linux/iommufd.h | 10 +++++
3 files changed, 56 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 89daf50be87b..327819663547 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2746,6 +2746,46 @@ static void arm_smmu_release_device(struct device *dev)
kfree(master);
}
+static int arm_smmu_set_dev_user_data(struct device *dev, const void *user_data)
+{
+ const struct iommu_device_data_arm_smmuv3 *user = user_data;
+ struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+ struct arm_smmu_stream *stream = &master->streams[0];
+ struct arm_smmu_device *smmu = master->smmu;
+ u32 sid_user = user->sid;
+ int ret = 0;
+
+ if (!sid_user)
+ return -EINVAL;
+
+ ret = xa_alloc(&smmu->streams_user, &sid_user, stream,
+ XA_LIMIT(sid_user, sid_user), GFP_KERNEL_ACCOUNT);
+ if (ret)
+ return ret;
+ stream->id_user = sid_user;
+ return 0;
+}
+
+static void arm_smmu_unset_dev_user_data(struct device *dev)
+{
+ struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+ struct arm_smmu_stream *stream = &master->streams[0];
+ struct arm_smmu_device *smmu = master->smmu;
+ u32 sid_user = stream->id_user;
+
+ if (!sid_user)
+ return;
+
+ xa_lock(&smmu->streams_user);
+ stream = __xa_erase(&smmu->streams_user, sid_user);
+ if (stream != master->streams) {
+ WARN_ON(__xa_alloc(&smmu->streams_user, &sid_user, stream,
+ XA_LIMIT(sid_user, sid_user),
+ GFP_KERNEL_ACCOUNT));
+ }
+ xa_unlock(&smmu->streams_user);
+}
+
static struct iommu_group *arm_smmu_device_group(struct device *dev)
{
struct iommu_group *group;
@@ -2873,6 +2913,9 @@ static struct iommu_ops arm_smmu_ops = {
.domain_alloc = arm_smmu_domain_alloc,
.probe_device = arm_smmu_probe_device,
.release_device = arm_smmu_release_device,
+ .set_dev_user_data = arm_smmu_set_dev_user_data,
+ .unset_dev_user_data = arm_smmu_unset_dev_user_data,
+ .dev_user_data_len = sizeof(struct iommu_device_data_arm_smmuv3),
.device_group = arm_smmu_device_group,
.of_xlate = arm_smmu_of_xlate,
.get_resv_regions = arm_smmu_get_resv_regions,
@@ -3108,6 +3151,7 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
mutex_init(&smmu->streams_mutex);
smmu->streams = RB_ROOT;
+ xa_init_flags(&smmu->streams_user, XA_FLAGS_ALLOC1 | XA_FLAGS_ACCOUNT);
ret = arm_smmu_init_queues(smmu);
if (ret)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 5fa010c6fe20..1951a80af241 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -676,10 +676,12 @@ struct arm_smmu_device {
struct rb_root streams;
struct mutex streams_mutex;
+ struct xarray streams_user;
};
struct arm_smmu_stream {
u32 id;
+ u32 id_user;
struct arm_smmu_master *master;
struct rb_node node;
};
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index b1fe4c59ab82..2ecb6240ec69 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -592,6 +592,16 @@ struct iommu_hw_info {
};
#define IOMMU_DEVICE_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_GET_HW_INFO)
+/**
+ * struct iommu_device_data_arm_smmuv3 - ARM SMMUv3 specific device data
+ * @sid: The Stream ID that is assigned in the user space
+ *
+ * This should be passed via the VFIO_DEVICE_BIND_IOMMUFD ioctl.
+ */
+struct iommu_device_data_arm_smmuv3 {
+ __u32 sid;
+};
+
/**
* enum iommu_hwpt_intel_vtd_invalidate_flags - Flags for Intel VT-d
* stage-1 page table cache
--
2.40.1
From: Robin Murphy <[email protected]>
Currently, iommu-dma is the only place outside of IOMMUFD and drivers
which might need to be aware of the stage 2 domain encapsulated within
a nested domain. This would be in the legacy-VFIO-style case where we're
using host-managed MSIs with an identity mapping at stage 1, where it is
the underlying stage 2 domain which owns an MSI cookie and holds the
corresponding dynamic mappings. Hook up the new op to resolve what we
need from a nested domain.
Signed-off-by: Robin Murphy <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/iommu/dma-iommu.c | 18 ++++++++++++++++--
include/linux/iommu.h | 4 ++++
2 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7a9f0b0bddbd..da2b28eb9723 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1642,6 +1642,20 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
return NULL;
}
+/*
+ * Nested domains may not have an MSI cookie or accept mappings, but they may
+ * be related to a domain which does, so we let them tell us what they need.
+ */
+static struct iommu_domain *iommu_dma_get_msi_mapping_domain(struct device *dev)
+{
+ struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+ if (domain && domain->type == IOMMU_DOMAIN_NESTED &&
+ domain->ops->get_msi_mapping_domain)
+ domain = domain->ops->get_msi_mapping_domain(domain);
+ return domain;
+}
+
/**
* iommu_dma_prepare_msi() - Map the MSI page in the IOMMU domain
* @desc: MSI descriptor, will store the MSI page
@@ -1652,7 +1666,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
{
struct device *dev = msi_desc_to_dev(desc);
- struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+ struct iommu_domain *domain = iommu_dma_get_msi_mapping_domain(dev);
struct iommu_dma_msi_page *msi_page;
static DEFINE_MUTEX(msi_prepare_lock); /* see below */
@@ -1685,7 +1699,7 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
{
struct device *dev = msi_desc_to_dev(desc);
- const struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+ const struct iommu_domain *domain = iommu_dma_get_msi_mapping_domain(dev);
const struct iommu_dma_msi_page *msi_page;
msi_page = msi_desc_get_iommu_cookie(desc);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 470b088a13f7..cb517898f55c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -362,6 +362,8 @@ struct iommu_ops {
* specific mechanisms.
* @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
* @free: Release the domain after use.
+ * @get_msi_mapping_domain: Return the related iommu_domain that should hold the
+ * MSI cookie and accept mapping(s).
*/
struct iommu_domain_ops {
int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
@@ -396,6 +398,8 @@ struct iommu_domain_ops {
unsigned long quirks);
void (*free)(struct iommu_domain *domain);
+ struct iommu_domain *
+ (*get_msi_mapping_domain)(struct iommu_domain *domain);
};
/**
--
2.40.1
Add arm_smmu_cache_invalidate_user() function for user space to invalidate
TLB entries and Context Descriptors, since either an IO page table entrie
or a Context Descriptor in the user space is still cached by the hardware.
The input user_data is defined in struct iommu_hwpt_invalidate_arm_smmuv3,
which contains the essential info of the command queue of the guest OS. It
will be read by the host kernel to have a direct access to the guest queue
to fetch all invalidation commands. Those commands should be also fixed at
the SID and VMID fields by replacing with the correct physical values.
Co-developed-by: Eric Auger <[email protected]>
Signed-off-by: Eric Auger <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 113 ++++++++++++++++++++
1 file changed, 113 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 3902dd8cfcfa..124f63fe52e1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2994,10 +2994,123 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
arm_smmu_sva_remove_dev_pasid(domain, dev, pasid);
}
+static int arm_smmu_fix_user_cmd(struct arm_smmu_domain *smmu_domain, u64 *cmd)
+{
+ struct arm_smmu_stream *stream;
+
+ switch (*cmd & CMDQ_0_OP) {
+ case CMDQ_OP_TLBI_NSNH_ALL:
+ *cmd &= ~CMDQ_0_OP;
+ *cmd |= CMDQ_OP_TLBI_NH_ALL;
+ fallthrough;
+ case CMDQ_OP_TLBI_NH_VA:
+ case CMDQ_OP_TLBI_NH_VAA:
+ case CMDQ_OP_TLBI_NH_ALL:
+ case CMDQ_OP_TLBI_NH_ASID:
+ *cmd &= ~CMDQ_TLBI_0_VMID;
+ *cmd |= FIELD_PREP(CMDQ_TLBI_0_VMID,
+ smmu_domain->s2->s2_cfg.vmid);
+ break;
+ case CMDQ_OP_ATC_INV:
+ case CMDQ_OP_CFGI_CD:
+ case CMDQ_OP_CFGI_CD_ALL:
+ xa_lock(&smmu_domain->smmu->streams_user);
+ stream = xa_load(&smmu_domain->smmu->streams_user,
+ FIELD_GET(CMDQ_CFGI_0_SID, *cmd));
+ xa_unlock(&smmu_domain->smmu->streams_user);
+ if (!stream)
+ return -ENODEV;
+ *cmd &= ~CMDQ_CFGI_0_SID;
+ *cmd |= FIELD_PREP(CMDQ_CFGI_0_SID, stream->id);
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+ pr_debug("Fixed user CMD: %016llx : %016llx\n", cmd[1], cmd[0]);
+
+ return 0;
+}
+
+static int arm_smmu_cache_invalidate_user(struct iommu_domain *domain,
+ void *user_data)
+{
+ const u32 cons_err = FIELD_PREP(CMDQ_CONS_ERR, CMDQ_ERR_CERROR_ILL_IDX);
+ struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+ struct iommu_hwpt_invalidate_arm_smmuv3 *inv = user_data;
+ struct arm_smmu_device *smmu = smmu_domain->smmu;
+ struct arm_smmu_queue q = {
+ .llq = {
+ .prod = inv->cmdq_prod,
+ .max_n_shift = inv->cmdq_log2size,
+ },
+ .base = u64_to_user_ptr(inv->cmdq_uptr),
+ .ent_dwords = inv->cmdq_entry_size / sizeof(u64),
+ };
+ unsigned int nents = 1 << q.llq.max_n_shift;
+ void __user *cons_uptr;
+ int ncmds, i = 0;
+ u32 prod, cons;
+ u64 *cmds;
+ int ret;
+
+ if (!smmu || !smmu_domain->s2 || domain->type != IOMMU_DOMAIN_NESTED)
+ return -EINVAL;
+ if (q.ent_dwords != CMDQ_ENT_DWORDS)
+ return -EINVAL;
+ WARN_ON(q.llq.max_n_shift > smmu->cmdq.q.llq.max_n_shift);
+
+ cons_uptr = u64_to_user_ptr(inv->cmdq_cons_uptr);
+ if (copy_from_user(&q.llq.cons, cons_uptr, sizeof(u32)))
+ return -EFAULT;
+ if (queue_empty(&q.llq))
+ return -EINVAL;
+
+ prod = Q_IDX(&q.llq, q.llq.prod);
+ cons = Q_IDX(&q.llq, q.llq.cons);
+ if (Q_WRP(&q.llq, q.llq.prod) == Q_WRP(&q.llq, q.llq.cons))
+ ncmds = prod - cons;
+ else
+ ncmds = nents - cons + prod;
+ cmds = kcalloc(ncmds, inv->cmdq_entry_size, GFP_KERNEL);
+ if (!cmds)
+ return -ENOMEM;
+
+ do {
+ u64 *cmd = &cmds[i * CMDQ_ENT_DWORDS];
+
+ ret = copy_from_user(cmd, Q_ENT(&q, q.llq.cons),
+ inv->cmdq_entry_size);
+ if (ret) {
+ ret = -EFAULT;
+ goto out_free_cmds;
+ }
+
+ ret = arm_smmu_fix_user_cmd(smmu_domain, cmd);
+ if (ret && ret != -EOPNOTSUPP) {
+ q.llq.cons |= cons_err;
+ goto out_copy_cons;
+ }
+ if (!ret)
+ i++;
+ queue_inc_cons(&q.llq);
+ } while (!queue_empty(&q.llq));
+
+ ret = arm_smmu_cmdq_issue_cmdlist(smmu, cmds, i, true);
+out_copy_cons:
+ if (copy_to_user(cons_uptr, &q.llq.cons, sizeof(u32)))
+ ret = -EFAULT;
+out_free_cmds:
+ kfree(cmds);
+ return ret;
+}
+
static const struct iommu_domain_ops arm_smmu_nested_domain_ops = {
.attach_dev = arm_smmu_attach_dev,
.free = arm_smmu_domain_free,
.get_msi_mapping_domain = arm_smmu_get_msi_mapping_domain,
+ .cache_invalidate_user = arm_smmu_cache_invalidate_user,
+ .cache_invalidate_user_data_len =
+ sizeof(struct iommu_hwpt_invalidate_arm_smmuv3),
};
/**
--
2.40.1
From: Jason Gunthorpe <[email protected]>
This control causes the ARM SMMU drivers to choose a stage 2
implementation for the IO pagetable (vs the stage 1 usual default),
however this choice has no visible impact to the VFIO user. Further qemu
never implemented this and no other userspace user is known.
The original description in commit f5c9ecebaf2a ("vfio/iommu_type1: add
new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to "provide
SMMU translation services to the guest operating system" however the rest
of the API to set the guest table pointer for the stage 1 was never
completed, or at least never upstreamed, rendering this part useless dead
code.
Since the current patches to enable nested translation, aka userspace page
tables, rely on iommufd and will not use the enable_nesting()
iommu_domain_op, remove this infrastructure. However, don't cut too deep
into the SMMU drivers for now expecting the iommufd work to pick it up -
we still need to create S2 IO page tables.
Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the
enable_nesting iommu_domain_op.
Just in-case there is some userspace using this continue to treat
requesting it as a NOP, but do not advertise support any more.
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 ----------------
drivers/iommu/arm/arm-smmu/arm-smmu.c | 16 ----------------
drivers/iommu/iommu.c | 10 ----------
drivers/iommu/iommufd/vfio_compat.c | 7 +------
drivers/vfio/vfio_iommu_type1.c | 12 +-----------
include/linux/iommu.h | 3 ---
include/uapi/linux/vfio.h | 2 +-
7 files changed, 3 insertions(+), 63 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 3fd83fb75722..2c53849cae30 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2740,21 +2740,6 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)
return group;
}
-static int arm_smmu_enable_nesting(struct iommu_domain *domain)
-{
- struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
- int ret = 0;
-
- mutex_lock(&smmu_domain->init_mutex);
- if (smmu_domain->smmu)
- ret = -EPERM;
- else
- smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
- mutex_unlock(&smmu_domain->init_mutex);
-
- return ret;
-}
-
static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
{
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2881,7 +2866,6 @@ static struct iommu_ops arm_smmu_ops = {
.flush_iotlb_all = arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys = arm_smmu_iova_to_phys,
- .enable_nesting = arm_smmu_enable_nesting,
.free = arm_smmu_domain_free,
}
};
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 6e0813b26fb6..97d3fbcbd7f3 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1484,21 +1484,6 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)
return group;
}
-static int arm_smmu_enable_nesting(struct iommu_domain *domain)
-{
- struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
- int ret = 0;
-
- mutex_lock(&smmu_domain->init_mutex);
- if (smmu_domain->smmu)
- ret = -EPERM;
- else
- smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
- mutex_unlock(&smmu_domain->init_mutex);
-
- return ret;
-}
-
static int arm_smmu_set_pgtable_quirks(struct iommu_domain *domain,
unsigned long quirks)
{
@@ -1579,7 +1564,6 @@ static struct iommu_ops arm_smmu_ops = {
.flush_iotlb_all = arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys = arm_smmu_iova_to_phys,
- .enable_nesting = arm_smmu_enable_nesting,
.set_pgtable_quirks = arm_smmu_set_pgtable_quirks,
.free = arm_smmu_domain_free,
}
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b44fd1a76997..13a2e0e26884 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2680,16 +2680,6 @@ static int __init iommu_init(void)
}
core_initcall(iommu_init);
-int iommu_enable_nesting(struct iommu_domain *domain)
-{
- if (domain->type != IOMMU_DOMAIN_UNMANAGED)
- return -EINVAL;
- if (!domain->ops->enable_nesting)
- return -EINVAL;
- return domain->ops->enable_nesting(domain);
-}
-EXPORT_SYMBOL_GPL(iommu_enable_nesting);
-
int iommu_set_pgtable_quirks(struct iommu_domain *domain,
unsigned long quirk)
{
diff --git a/drivers/iommu/iommufd/vfio_compat.c b/drivers/iommu/iommufd/vfio_compat.c
index fe02517c73cc..2c5a523b5354 100644
--- a/drivers/iommu/iommufd/vfio_compat.c
+++ b/drivers/iommu/iommufd/vfio_compat.c
@@ -291,12 +291,7 @@ static int iommufd_vfio_check_extension(struct iommufd_ctx *ictx,
case VFIO_DMA_CC_IOMMU:
return iommufd_vfio_cc_iommu(ictx);
- /*
- * This is obsolete, and to be removed from VFIO. It was an incomplete
- * idea that got merged.
- * https://lore.kernel.org/kvm/[email protected]/
- */
- case VFIO_TYPE1_NESTING_IOMMU:
+ case __VFIO_RESERVED_TYPE1_NESTING_IOMMU:
return 0;
/*
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 3d4dd9420c30..0bce568f129a 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -72,7 +72,6 @@ struct vfio_iommu {
uint64_t pgsize_bitmap;
uint64_t num_non_pinned_groups;
bool v2;
- bool nesting;
bool dirty_page_tracking;
struct list_head emulated_iommu_groups;
};
@@ -2200,12 +2199,6 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
if (!domain->domain)
goto out_free_domain;
- if (iommu->nesting) {
- ret = iommu_enable_nesting(domain->domain);
- if (ret)
- goto out_domain;
- }
-
ret = iommu_attach_group(domain->domain, group->iommu_group);
if (ret)
goto out_domain;
@@ -2546,9 +2539,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
switch (arg) {
case VFIO_TYPE1_IOMMU:
break;
- case VFIO_TYPE1_NESTING_IOMMU:
- iommu->nesting = true;
- fallthrough;
+ case __VFIO_RESERVED_TYPE1_NESTING_IOMMU:
case VFIO_TYPE1v2_IOMMU:
iommu->v2 = true;
break;
@@ -2643,7 +2634,6 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu,
switch (arg) {
case VFIO_TYPE1_IOMMU:
case VFIO_TYPE1v2_IOMMU:
- case VFIO_TYPE1_NESTING_IOMMU:
case VFIO_UNMAP_ALL:
return 1;
case VFIO_UPDATE_VADDR:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1e1124f79b56..470b088a13f7 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -360,7 +360,6 @@ struct iommu_ops {
* @enforce_cache_coherency: Prevent any kind of DMA from bypassing IOMMU_CACHE,
* including no-snoop TLPs on PCIe or other platform
* specific mechanisms.
- * @enable_nesting: Enable nesting
* @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
* @free: Release the domain after use.
*/
@@ -393,7 +392,6 @@ struct iommu_domain_ops {
dma_addr_t iova);
bool (*enforce_cache_coherency)(struct iommu_domain *domain);
- int (*enable_nesting)(struct iommu_domain *domain);
int (*set_pgtable_quirks)(struct iommu_domain *domain,
unsigned long quirks);
@@ -566,7 +564,6 @@ extern int iommu_page_response(struct device *dev,
extern int iommu_group_id(struct iommu_group *group);
extern struct iommu_domain *iommu_group_default_domain(struct iommu_group *);
-int iommu_enable_nesting(struct iommu_domain *domain);
int iommu_set_pgtable_quirks(struct iommu_domain *domain,
unsigned long quirks);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index e57112f63760..ecbd013ae6ea 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -35,7 +35,7 @@
#define VFIO_EEH 5
/* Two-stage IOMMU */
-#define VFIO_TYPE1_NESTING_IOMMU 6 /* Implies v2 */
+#define __VFIO_RESERVED_TYPE1_NESTING_IOMMU 6 /* Implies v2 */
#define VFIO_SPAPR_TCE_v2_IOMMU 7
--
2.40.1
From: Eric Auger <[email protected]>
Despite the spec does not seem to mention this, on some implementations,
when the STE configuration switches from an S1+S2 cfg to an S1 only one,
a C_BAD_STE error would happen if dst[3] (S2TTB) is not reset.
Explicitly reset those two higher 64b fields, to prevent that.
Note that this is not a bug at this moment, since a 2-stage translation
setup is not yet enabled, until the following patches add its support.
Reported-by: Shameer Kolothum <[email protected]>
Link: https://patchwork.kernel.org/cover/11449895/#23244457
Signed-off-by: Eric Auger <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index c57c70132c0b..792e8a788e2e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1373,6 +1373,17 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
dst[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
+ } else {
+ /*
+ * Unset dst[2] and dst[3] to clear stage-2 configurations. This was observed
+ * on a HiSilicon implementation where, if the SMMUv3 is configured with both
+ * stage-1 and stage-2 mode once, it is not possible to configure it back for
+ * stage-1 mode for the same device (stream id). The SMMUv3 implementation on
+ * these boards expects to set the S2TTB field in STE to zero when using S1,
+ * otherwise it reports C_BAD_STE error.
+ */
+ dst[2] = 0;
+ dst[3] = 0;
}
if (master->ats_enabled)
--
2.40.1
> From: Nicolin Chen <[email protected]>
> Sent: Wednesday, May 10, 2023 11:33 AM
>
> One unique requirement for SMMUv3 nested translation support is the MSI
> doorbell address translation, which is a 2-stage translation too. And,
> to working with the ITS driver, an msi_cookie needs to be setup on the
> kernel-managed domain, the stage-2 domain of the nesting setup. And the
> same msi_cookie will be fetched, via
> iommu_dma_get_msi_mapping_domain(),
> in the iommu core to allocate and creates IOVA mappings for MSI doorbell
> page(s). However, with the nesting design, the device is attached to a
> user-managed domain, the stage-1 domain. So both the setup and fetching
> of the msi_cookie would not work at the level of stage-2 domain. Thus,
> on both sides, the msi_cookie setup and fetching require a redirection
> of the domain pointer. It's easy to do so in iommufd core, but needs a
> new op in the iommu core and driver.
>
Looks the new preferred way is to map the physical ITS page to an IPA
provided by Qemu then let the guest allocate the cookie in S1 which
is then passed back by Qemu to the host kernel? [1]
[1] https://lore.kernel.org/linux-iommu/[email protected]/
On Wed, May 10, 2023 at 08:11:28AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <[email protected]>
> > Sent: Wednesday, May 10, 2023 11:33 AM
> >
> > One unique requirement for SMMUv3 nested translation support is the MSI
> > doorbell address translation, which is a 2-stage translation too. And,
> > to working with the ITS driver, an msi_cookie needs to be setup on the
> > kernel-managed domain, the stage-2 domain of the nesting setup. And the
> > same msi_cookie will be fetched, via
> > iommu_dma_get_msi_mapping_domain(),
> > in the iommu core to allocate and creates IOVA mappings for MSI doorbell
> > page(s). However, with the nesting design, the device is attached to a
> > user-managed domain, the stage-1 domain. So both the setup and fetching
> > of the msi_cookie would not work at the level of stage-2 domain. Thus,
> > on both sides, the msi_cookie setup and fetching require a redirection
> > of the domain pointer. It's easy to do so in iommufd core, but needs a
> > new op in the iommu core and driver.
> >
>
> Looks the new preferred way is to map the physical ITS page to an IPA
> provided by Qemu then let the guest allocate the cookie in S1 which
> is then passed back by Qemu to the host kernel? [1]
>
> [1] https://lore.kernel.org/linux-iommu/[email protected]/
Hmm..is that something firm to implement at this stage?
Thank you
Nicolin
Hi, Nico
On Wed, 10 May 2023 at 11:34, Nicolin Chen <[email protected]> wrote:
>
> [ This series is rebased on top of v6.4-rc1 merging Jason's iommu_hwpt
> branch and Yi's vfio cdev v11 branch, then the replace v7 series and
> the nesting v2 (candidate) series and Intel VT-d series. Note that
> some of them are still getting finalized. So, there can be potential
> minor API changes that would not be reflected in this series. Yet, we
> can start the review at the SMMU driver specific things.
>
> @robin, the hw_info patch still requires the errata patch that you
> mentioned. Perhaps we can merge that separately or include it in v3.
>
> Thanks! ]
>
> Changelog
> v2:
> * Added arm_smmu_set_dev_data after the set_dev_data series.
> * Added Jason's patch "vfio: Remove VFIO_TYPE1_NESTING_IOMMU"
> * Replaced the iommu_get_unmanaged_domain() helper with Robin's patch.
> * Reworked the code in arm_smmu_cmdq_build_cmd() to make NH_VA to be
> a superset of NH_VAA.
> * Added inline comments and a bug-report link to the patch unsetting
> dst[2] and dst[3] of STE.
> * Dropped the to_s2_cfg helper since only one place really needs it.
> * Dropped the VMID (override) flag and s2vmid in iommu_hwpt_arm_smmuv3
> structure, because it's expected for user space to use a shared S2
> domain/hwpt for all devices, i.e. the VMID (allocated with the S2
> domain is already unified. If there's some special case that still
> needs a VMID unification, we should probably add it incrementally.
> * Move the introduction of the "struct arm_smmu_domain *s2" function
> parameter to the proper patch.
> * Redefined "struct iommu_hwpt_arm_smmuv3" by adding ste_uptr/len and
> out_event_uptr/len. Then added an arm_smmu_domain_finalise_nested()
> function to read guest Stream Table Entry with a proper sanity.
> * Reworked arm_smmu_cache_invalidate_user() by reading the guest CMDQ
> directly, to support batching. Also, added return value feedback of
> -ETIMEDOUT at CMD_SYNC, and reported CERROR_ILL errors via the CONS
> in the user_data structure.
> * Updated data/functions following the nesting infrastructure updates.
> * Added/fixed multiple comments per v1 review inputs.
> v1:
> https://lore.kernel.org/all/[email protected]/
>
> --------------------------------------------------------------------------
>
> Hi all,
>
> This series of patches add nested translation support for ARM SMMUv3.
>
> Eric Auger made a huge effort previously with the VFIO uAPIs, and sent
> his v16 a year ago. Now, the nested translation should follow the new
> IOMMUFD uAPIs design. So, most of the key features are ported from the
> privous VFIO solution, and then rebuilt on top of the IOMMUFD nesting
> infrastructure.
>
> The essential parts in the driver to support a nested translation are
> ->hw_info, ->domain_alloc_user and ->cache_invalidate_user ops. So this
> series fundamentally adds these three functions in the SMMUv3 driver,
> along with several preparations and cleanups for them.
>
> One unique requirement for SMMUv3 nested translation support is the MSI
> doorbell address translation, which is a 2-stage translation too. And,
> to working with the ITS driver, an msi_cookie needs to be setup on the
> kernel-managed domain, the stage-2 domain of the nesting setup. And the
> same msi_cookie will be fetched, via iommu_dma_get_msi_mapping_domain(),
> in the iommu core to allocate and creates IOVA mappings for MSI doorbell
> page(s). However, with the nesting design, the device is attached to a
> user-managed domain, the stage-1 domain. So both the setup and fetching
> of the msi_cookie would not work at the level of stage-2 domain. Thus,
> on both sides, the msi_cookie setup and fetching require a redirection
> of the domain pointer. It's easy to do so in iommufd core, but needs a
> new op in the iommu core and driver.
>
> You can also find this series on the Github:
> https://github.com/nicolinc/iommufd/commits/iommufd_nesting-v2
>
> The kernel branch is tested with this QEMU branch:
> https://github.com/nicolinc/qemu/commits/wip/iommufd_rfcv4+nesting+smmuv3-v2
>
I rebased on these two branches and did some basic tests.
The basic functions work after backport
iommufd: Add IOMMU_PAGE_RESPONSE
iommufd: Add device fault handler support
https://github.com/Linaro/linux-kernel-warpdrive/tree/uacce-devel-6.4
https://github.com/Linaro/qemu/tree/iommufd-6.4-nesting-smmuv3-v2
However when debugging hotplug PCI device, it still does not work,
Segmentation fault same as 6.2.
guest kernel
CONFIG_HOTPLUG_PCI_PCIE=y
boot guest (this info does not appear in 6.2)
qemu-system-aarch64: -device
vfio-pci,host=0000:76:00.1,bus=pci.1,addr=0x0,id=acc1,iommufd=iommufd0:
Failed to set data -1
qemu-system-aarch64: -device
vfio-pci,host=0000:76:00.1,bus=pci.1,addr=0x0,id=acc1,iommufd=iommufd0:
failed to set device data
$ sudo nc -U /tmp/qmpm_1.socket
(qemu) info pci
(qemu) device_del acc1
guest:
qemu-system-aarch64: IOMMU_IOAS_UNMAP failed: No such file or directory
qemu-system-aarch64: vfio_container_dma_unmap(0xaaaae1fc0380,
0x8000000000, 0x10000) = -2 (No such file or directory)
qemu-system-aarch64: Failed to unset data -1
Segmentation fault (core dumped). // also happened in 6.2
Thanks
Hi Zhangfei,
On Mon, May 15, 2023 at 06:00:26PM +0800, Zhangfei Gao wrote:
> I rebased on these two branches and did some basic tests.
>
> The basic functions work after backport
> iommufd: Add IOMMU_PAGE_RESPONSE
> iommufd: Add device fault handler support
>
> https://github.com/Linaro/linux-kernel-warpdrive/tree/uacce-devel-6.4
> https://github.com/Linaro/qemu/tree/iommufd-6.4-nesting-smmuv3-v2
Thanks for testing!
> However when debugging hotplug PCI device, it still does not work,
> Segmentation fault same as 6.2.
>
> guest kernel
> CONFIG_HOTPLUG_PCI_PCIE=y
>
> boot guest (this info does not appear in 6.2)
> qemu-system-aarch64: -device
> vfio-pci,host=0000:76:00.1,bus=pci.1,addr=0x0,id=acc1,iommufd=iommufd0:
> Failed to set data -1
> qemu-system-aarch64: -device
> vfio-pci,host=0000:76:00.1,bus=pci.1,addr=0x0,id=acc1,iommufd=iommufd0:
> failed to set device data
Hmm.. I wonder what fails the set_dev_data ioctl...
> $ sudo nc -U /tmp/qmpm_1.socket
> (qemu) info pci
> (qemu) device_del acc1
>
> guest:
> qemu-system-aarch64: IOMMU_IOAS_UNMAP failed: No such file or directory
> qemu-system-aarch64: vfio_container_dma_unmap(0xaaaae1fc0380,
> 0x8000000000, 0x10000) = -2 (No such file or directory)
This is resulted from the following commit that we should
drop later:
commit c4fd2efd7c02dd30491adf676c1b0aed67656f36
Author: Yi Liu <[email protected]>
Date: Thu Apr 27 05:47:03 2023 -0700
vfio/container: Skip readonly pages
This is a temparary solution for Intel platform due to an errata in
which readonly pages in second stage page table is exclusive with
nested support.
Signed-off-by: Yi Liu <[email protected]>
> qemu-system-aarch64: Failed to unset data -1
> Segmentation fault (core dumped). // also happened in 6.2
Hmm, would it be possible for you to run the test again by
adding the following tracers to your QEMU command?
--trace "iommufd*" \
--trace "smmu*" \
--trace "vfio_*" \
--trace "pci_*"
Thanks
Nic
On Mon, 15 May 2023 at 23:58, Nicolin Chen <[email protected]> wrote:
>
> Hi Zhangfei,
>
> On Mon, May 15, 2023 at 06:00:26PM +0800, Zhangfei Gao wrote:
>
> > I rebased on these two branches and did some basic tests.
> >
> > The basic functions work after backport
> > iommufd: Add IOMMU_PAGE_RESPONSE
> > iommufd: Add device fault handler support
> >
> > https://github.com/Linaro/linux-kernel-warpdrive/tree/uacce-devel-6.4
> > https://github.com/Linaro/qemu/tree/iommufd-6.4-nesting-smmuv3-v2
>
> Thanks for testing!
>
> > However when debugging hotplug PCI device, it still does not work,
> > Segmentation fault same as 6.2.
> >
> > guest kernel
> > CONFIG_HOTPLUG_PCI_PCIE=y
> >
> > boot guest (this info does not appear in 6.2)
> > qemu-system-aarch64: -device
> > vfio-pci,host=0000:76:00.1,bus=pci.1,addr=0x0,id=acc1,iommufd=iommufd0:
> > Failed to set data -1
> > qemu-system-aarch64: -device
> > vfio-pci,host=0000:76:00.1,bus=pci.1,addr=0x0,id=acc1,iommufd=iommufd0:
> > failed to set device data
>
> Hmm.. I wonder what fails the set_dev_data ioctl...
Simply debug, it is because dev_data.sid=0, causing
arm_smmu_set_dev_user_data fail
hw/arm/smmu-common.c
smmu_dev_set_iommu_device
.sid = smmu_get_sid(sdev)
smmu_dev_set_iommu_device dev_data.sid=0
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
arm_smmu_set_dev_user_data
u32 sid_user = user->sid;
if (!sid_user) return -EINVAL;
>
> > $ sudo nc -U /tmp/qmpm_1.socket
> > (qemu) info pci
> > (qemu) device_del acc1
> >
> > guest:
> > qemu-system-aarch64: IOMMU_IOAS_UNMAP failed: No such file or directory
> > qemu-system-aarch64: vfio_container_dma_unmap(0xaaaae1fc0380,
> > 0x8000000000, 0x10000) = -2 (No such file or directory)
>
From ex-email reply
(Eric) In qemu arm virt machine 0x8000000000 matches the PCI MMIO region.
(Yi) Currently, iommufd kernel part doesn't support mapping device BAR MMIO.
This is a known gap.
> This is resulted from the following commit that we should
> drop later:
>
> commit c4fd2efd7c02dd30491adf676c1b0aed67656f36
> Author: Yi Liu <[email protected]>
> Date: Thu Apr 27 05:47:03 2023 -0700
>
> vfio/container: Skip readonly pages
>
> This is a temparary solution for Intel platform due to an errata in
> which readonly pages in second stage page table is exclusive with
> nested support.
>
> Signed-off-by: Yi Liu <[email protected]>
>
>
> > qemu-system-aarch64: Failed to unset data -1
> > Segmentation fault (core dumped). // also happened in 6.2
>
> Hmm, would it be possible for you to run the test again by
> adding the following tracers to your QEMU command?
> --trace "iommufd*" \
> --trace "smmu*" \
> --trace "vfio_*" \
> --trace "pci_*"
>
Have sent you the log directly, since it is too big.
Thanks
On Tue, May 16, 2023 at 11:12:44AM +0800, Zhangfei Gao wrote:
> > > However when debugging hotplug PCI device, it still does not work,
> > > Segmentation fault same as 6.2.
> > >
> > > guest kernel
> > > CONFIG_HOTPLUG_PCI_PCIE=y
> > >
> > > boot guest (this info does not appear in 6.2)
> > > qemu-system-aarch64: -device
> > > vfio-pci,host=0000:76:00.1,bus=pci.1,addr=0x0,id=acc1,iommufd=iommufd0:
> > > Failed to set data -1
> > > qemu-system-aarch64: -device
> > > vfio-pci,host=0000:76:00.1,bus=pci.1,addr=0x0,id=acc1,iommufd=iommufd0:
> > > failed to set device data
> >
> > Hmm.. I wonder what fails the set_dev_data ioctl...
> Simply debug, it is because dev_data.sid=0, causing
> arm_smmu_set_dev_user_data fail
I found that too. The input pci bus number is 1, yet the in
the context of set_dev_data, the pci bus number is 0, which
resulted in a 0-valued sid. I will take another look to get
why.
> > > $ sudo nc -U /tmp/qmpm_1.socket
> > > (qemu) info pci
> > > (qemu) device_del acc1
> > >
> > > guest:
> > > qemu-system-aarch64: IOMMU_IOAS_UNMAP failed: No such file or directory
> > > qemu-system-aarch64: vfio_container_dma_unmap(0xaaaae1fc0380,
> > > 0x8000000000, 0x10000) = -2 (No such file or directory)
> >
> From ex-email reply
> (Eric) In qemu arm virt machine 0x8000000000 matches the PCI MMIO region.
> (Yi) Currently, iommufd kernel part doesn't support mapping device BAR MMIO.
> This is a known gap.
OK.
> > > qemu-system-aarch64: Failed to unset data -1
> > > Segmentation fault (core dumped). // also happened in 6.2
> >
> > Hmm, would it be possible for you to run the test again by
> > adding the following tracers to your QEMU command?
> > --trace "iommufd*" \
> > --trace "smmu*" \
> > --trace "vfio_*" \
> > --trace "pci_*"
> >
>
> Have sent you the log directly, since it is too big.
I have found two missing pieces in the device detach routine.
Applying the following should fix the crash at hotplug path.
----------------------------------------------------------------------------
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 89a256efa999..2344307523cb 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -151,8 +151,10 @@ void vfio_container_destroy(VFIOContainer *container)
}
QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, tmp) {
- memory_region_unregister_iommu_notifier(
- MEMORY_REGION(giommu->iommu_mr), &giommu->n);
+ if (giommu->n.notifier_flags) {
+ memory_region_unregister_iommu_notifier(
+ MEMORY_REGION(giommu->iommu_mr), &giommu->n);
+ }
QLIST_REMOVE(giommu, giommu_next);
g_free(giommu);
}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 844c60892db2..35d31480390d 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -652,6 +652,9 @@ found:
*/
if (QLIST_EMPTY(&container->hwpt_list)) {
vfio_as_del_container(space, bcontainer);
+ if (bcontainer->nested) {
+ memory_listener_unregister(& bcontainer->prereg_listener);
+ }
}
__vfio_device_detach_container(vbasedev, container, &err);
if (err) {
----------------------------------------------------------------------------
Would you please try your case with it?
Thanks
Nic
在 2023/5/26 07:42, Nicolin Chen 写道:
> On Tue, May 16, 2023 at 11:12:44AM +0800, Zhangfei Gao wrote:
>
>>>> However when debugging hotplug PCI device, it still does not work,
>>>> Segmentation fault same as 6.2.
>>>>
>>>> guest kernel
>>>> CONFIG_HOTPLUG_PCI_PCIE=y
>>>>
>>>> boot guest (this info does not appear in 6.2)
>>>> qemu-system-aarch64: -device
>>>> vfio-pci,host=0000:76:00.1,bus=pci.1,addr=0x0,id=acc1,iommufd=iommufd0:
>>>> Failed to set data -1
>>>> qemu-system-aarch64: -device
>>>> vfio-pci,host=0000:76:00.1,bus=pci.1,addr=0x0,id=acc1,iommufd=iommufd0:
>>>> failed to set device data
>>> Hmm.. I wonder what fails the set_dev_data ioctl...
>> Simply debug, it is because dev_data.sid=0, causing
>> arm_smmu_set_dev_user_data fail
> I found that too. The input pci bus number is 1, yet the in
> the context of set_dev_data, the pci bus number is 0, which
> resulted in a 0-valued sid. I will take another look to get
> why.
>
>>>> $ sudo nc -U /tmp/qmpm_1.socket
>>>> (qemu) info pci
>>>> (qemu) device_del acc1
>>>>
>>>> guest:
>>>> qemu-system-aarch64: IOMMU_IOAS_UNMAP failed: No such file or directory
>>>> qemu-system-aarch64: vfio_container_dma_unmap(0xaaaae1fc0380,
>>>> 0x8000000000, 0x10000) = -2 (No such file or directory)
>> From ex-email reply
>> (Eric) In qemu arm virt machine 0x8000000000 matches the PCI MMIO region.
>> (Yi) Currently, iommufd kernel part doesn't support mapping device BAR MMIO.
>> This is a known gap.
> OK.
>
>>>> qemu-system-aarch64: Failed to unset data -1
>>>> Segmentation fault (core dumped). // also happened in 6.2
>>> Hmm, would it be possible for you to run the test again by
>>> adding the following tracers to your QEMU command?
>>> --trace "iommufd*" \
>>> --trace "smmu*" \
>>> --trace "vfio_*" \
>>> --trace "pci_*"
>>>
>> Have sent you the log directly, since it is too big.
> I have found two missing pieces in the device detach routine.
> Applying the following should fix the crash at hotplug path.
>
> ----------------------------------------------------------------------------
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 89a256efa999..2344307523cb 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -151,8 +151,10 @@ void vfio_container_destroy(VFIOContainer *container)
> }
>
> QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, tmp) {
> - memory_region_unregister_iommu_notifier(
> - MEMORY_REGION(giommu->iommu_mr), &giommu->n);
> + if (giommu->n.notifier_flags) {
> + memory_region_unregister_iommu_notifier(
> + MEMORY_REGION(giommu->iommu_mr), &giommu->n);
> + }
> QLIST_REMOVE(giommu, giommu_next);
> g_free(giommu);
> }
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 844c60892db2..35d31480390d 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -652,6 +652,9 @@ found:
> */
> if (QLIST_EMPTY(&container->hwpt_list)) {
> vfio_as_del_container(space, bcontainer);
> + if (bcontainer->nested) {
> + memory_listener_unregister(& bcontainer->prereg_listener);
> + }
> }
> __vfio_device_detach_container(vbasedev, container, &err);
> if (err) {
> ----------------------------------------------------------------------------
>
> Would you please try your case with it?
Yes, this solve the hotplug segmentation fault
Still report
qemu-system-aarch64: IOMMU_IOAS_UNMAP failed: No such file or directory
qemu-system-aarch64: vfio_container_dma_unmap(0xaaaae622e300,
0x8000000000, 0x10000) = -2 (No such file or directory)
qemu-system-aarch64: Failed to unset data -1 (only the first time of
device_del)
Test with device_del and device_add
Thanks.
On Fri, May 26, 2023 at 09:58:52AM +0800, zhangfei gao wrote:
> > I have found two missing pieces in the device detach routine.
> > Applying the following should fix the crash at hotplug path.
> >
> > ----------------------------------------------------------------------------
> > diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> > index 89a256efa999..2344307523cb 100644
> > --- a/hw/vfio/container-base.c
> > +++ b/hw/vfio/container-base.c
> > @@ -151,8 +151,10 @@ void vfio_container_destroy(VFIOContainer *container)
> > }
> >
> > QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, tmp) {
> > - memory_region_unregister_iommu_notifier(
> > - MEMORY_REGION(giommu->iommu_mr), &giommu->n);
> > + if (giommu->n.notifier_flags) {
> > + memory_region_unregister_iommu_notifier(
> > + MEMORY_REGION(giommu->iommu_mr), &giommu->n);
> > + }
> > QLIST_REMOVE(giommu, giommu_next);
> > g_free(giommu);
> > }
> > diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> > index 844c60892db2..35d31480390d 100644
> > --- a/hw/vfio/iommufd.c
> > +++ b/hw/vfio/iommufd.c
> > @@ -652,6 +652,9 @@ found:
> > */
> > if (QLIST_EMPTY(&container->hwpt_list)) {
> > vfio_as_del_container(space, bcontainer);
> > + if (bcontainer->nested) {
> > + memory_listener_unregister(& bcontainer->prereg_listener);
> > + }
> > }
> > __vfio_device_detach_container(vbasedev, container, &err);
> > if (err) {
> > ----------------------------------------------------------------------------
> >
> > Would you please try your case with it?
>
>
> Yes, this solve the hotplug segmentation fault
Nice. Thanks!
> Still report
>
> qemu-system-aarch64: IOMMU_IOAS_UNMAP failed: No such file or directory
> qemu-system-aarch64: vfio_container_dma_unmap(0xaaaae622e300,
> 0x8000000000, 0x10000) = -2 (No such file or directory)
> qemu-system-aarch64: Failed to unset data -1 (only the first time of
> device_del)
>
> Test with device_del and device_add
I found the "pci.1" has secondary bus number 0 when VM inits:
(qemu) info pci
[...]
Bus 0, device 2, function 0:
PCI bridge: PCI device 1b36:000c
IRQ 0, pin A
BUS 0.
secondary bus 0.
subordinate bus 0.
IO range [0xf000, 0x0fff]
memory range [0xfff00000, 0x000fffff]
prefetchable memory range [0xfff00000, 0x000fffff]
BAR0: 32 bit memory at 0xffffffffffffffff [0x00000ffe].
id "pci.1"
Then it changes later during the guest OS boots:
(qemu) info pci
[...]
Bus 0, device 2, function 0:
PCI bridge: PCI device 1b36:000c
IRQ 255, pin A
BUS 0.
secondary bus 1.
subordinate bus 1.
IO range [0x0000, 0x0fff]
memory range [0x10000000, 0x101fffff]
prefetchable memory range [0x8000000000, 0x80000fffff]
BAR0: 32 bit memory at 0x10240000 [0x10240fff].
id "pci.1"
This must be related the PCI bus init thing, since it doesn't
fully assign correct the bus numbers and ranges being listed
above, in the first dump.
I will try figuring out what's going on, because this doesn't
make too much sense for our ->set_iommu_device callback if a
PCIBus isn't fully ready.
Alternatively, I could move the set_dev_data ioctl out of the
->set_iommu_device callback to a later stage.
Overall, this should be fixed in the next version.
Thank you
Nicolin