2020-11-18 11:26:05

by Eric Auger

[permalink] [raw]
Subject: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

This series brings the IOMMU part of HW nested paging support
in the SMMUv3. The VFIO part is submitted separately.

The IOMMU API is extended to support 2 new API functionalities:
1) pass the guest stage 1 configuration
2) pass stage 1 MSI bindings

Then those capabilities gets implemented in the SMMUv3 driver.

The virtualizer passes information through the VFIO user API
which cascades them to the iommu subsystem. This allows the guest
to own stage 1 tables and context descriptors (so-called PASID
table) while the host owns stage 2 tables and main configuration
structures (STE).

Best Regards

Eric

This series can be found at:
https://github.com/eauger/linux/tree/5.10-rc4-2stage-v13
(including the VFIO part in his last version: v11)

The series includes a patch from Jean-Philippe. It is better to
review the original patch:
[PATCH v8 2/9] iommu/arm-smmu-v3: Maintain a SID->device structure

The VFIO series is sent separately.

History:

v12 -> v13:
- fixed compilation issue with CONFIG_ARM_SMMU_V3_SVA
reported by Shameer. This urged me to revisit patch 4 into
iommu/smmuv3: Allow s1 and s2 configs to coexist where
s1_cfg and s2_cfg are not dynamically allocated anymore.
Instead I use a new set field in existing structs
- fixed 2 others config checks
- Updated "iommu/arm-smmu-v3: Maintain a SID->device structure"
according to the last version

v11 -> v12:
- rebase on top of v5.10-rc4

Eric Auger (14):
iommu: Introduce attach/detach_pasid_table API
iommu: Introduce bind/unbind_guest_msi
iommu/smmuv3: Allow s1 and s2 configs to coexist
iommu/smmuv3: Get prepared for nested stage support
iommu/smmuv3: Implement attach/detach_pasid_table
iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
iommu/smmuv3: Implement cache_invalidate
dma-iommu: Implement NESTED_MSI cookie
iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI
regions
iommu/smmuv3: Implement bind/unbind_guest_msi
iommu/smmuv3: Report non recoverable faults
iommu/smmuv3: Accept configs with more than one context descriptor
iommu/smmuv3: Add PASID cache invalidation per PASID

Jean-Philippe Brucker (1):
iommu/arm-smmu-v3: Maintain a SID->device structure

drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 659 ++++++++++++++++++--
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 103 ++-
drivers/iommu/dma-iommu.c | 142 ++++-
drivers/iommu/iommu.c | 105 ++++
include/linux/dma-iommu.h | 16 +
include/linux/iommu.h | 41 ++
include/uapi/linux/iommu.h | 54 ++
7 files changed, 1042 insertions(+), 78 deletions(-)

--
2.21.3


2020-11-18 11:26:07

by Eric Auger

[permalink] [raw]
Subject: [PATCH v13 04/15] iommu/smmuv3: Allow s1 and s2 configs to coexist

In true nested mode, both s1_cfg and s2_cfg will coexist.
Let's remove the union and add a "set" field in each
config structure telling whether the config is set and needs
to be applied when writing the STE. In legacy nested mode,
only the 2d stage is used. In true nested mode, the "set" field
will be set when the guest passes the pasid table.

Signed-off-by: Eric Auger <[email protected]>

---
v12 -> v13:
- does not dynamically allocate s1-cfg and s2_cfg anymore. Add
the set field
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 43 +++++++++++++--------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 8 ++--
2 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 1e4acc7f3d3c..18ac5af1b284 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1195,8 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
u64 val = le64_to_cpu(dst[0]);
bool ste_live = false;
struct arm_smmu_device *smmu = NULL;
- struct arm_smmu_s1_cfg *s1_cfg = NULL;
- struct arm_smmu_s2_cfg *s2_cfg = NULL;
+ struct arm_smmu_s1_cfg *s1_cfg;
+ struct arm_smmu_s2_cfg *s2_cfg;
struct arm_smmu_domain *smmu_domain = NULL;
struct arm_smmu_cmdq_ent prefetch_cmd = {
.opcode = CMDQ_OP_PREFETCH_CFG,
@@ -1211,13 +1211,24 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
}

if (smmu_domain) {
+ s1_cfg = &smmu_domain->s1_cfg;
+ s2_cfg = &smmu_domain->s2_cfg;
+
switch (smmu_domain->stage) {
case ARM_SMMU_DOMAIN_S1:
- s1_cfg = &smmu_domain->s1_cfg;
+ s1_cfg->set = true;
+ s2_cfg->set = false;
break;
case ARM_SMMU_DOMAIN_S2:
+ s1_cfg->set = false;
+ s2_cfg->set = true;
+ break;
case ARM_SMMU_DOMAIN_NESTED:
- s2_cfg = &smmu_domain->s2_cfg;
+ /*
+ * Actual usage of stage 1 depends on nested mode:
+ * legacy (2d stage only) or true nested mode
+ */
+ s2_cfg->set = true;
break;
default:
break;
@@ -1244,7 +1255,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
val = STRTAB_STE_0_V;

/* Bypass/fault */
- if (!smmu_domain || !(s1_cfg || s2_cfg)) {
+ if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
if (!smmu_domain && disable_bypass)
val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
else
@@ -1263,7 +1274,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
return;
}

- if (s1_cfg) {
+ if (s1_cfg->set) {
BUG_ON(ste_live);
dst[1] = cpu_to_le64(
FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
@@ -1282,7 +1293,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
FIELD_PREP(STRTAB_STE_0_S1FMT, s1_cfg->s1fmt);
}

- if (s2_cfg) {
+ if (s2_cfg->set) {
BUG_ON(ste_live);
dst[2] = cpu_to_le64(
FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
@@ -1846,24 +1857,24 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
{
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_device *smmu = smmu_domain->smmu;
+ struct arm_smmu_s1_cfg *s1_cfg = &smmu_domain->s1_cfg;
+ struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;

iommu_put_dma_cookie(domain);
free_io_pgtable_ops(smmu_domain->pgtbl_ops);

/* Free the CD and ASID, if we allocated them */
- if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
- struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
-
+ if (s1_cfg->set) {
/* Prevent SVA from touching the CD while we're freeing it */
mutex_lock(&arm_smmu_asid_lock);
- if (cfg->cdcfg.cdtab)
+ if (s1_cfg->cdcfg.cdtab)
arm_smmu_free_cd_tables(smmu_domain);
- arm_smmu_free_asid(&cfg->cd);
+ arm_smmu_free_asid(&s1_cfg->cd);
mutex_unlock(&arm_smmu_asid_lock);
- } else {
- struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
- if (cfg->vmid)
- arm_smmu_bitmap_free(smmu->vmid_map, cfg->vmid);
+ }
+ if (s2_cfg->set) {
+ if (s2_cfg->vmid)
+ arm_smmu_bitmap_free(smmu->vmid_map, s2_cfg->vmid);
}

kfree(smmu_domain);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 19196eea7c1d..07f59252dd21 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -562,12 +562,14 @@ struct arm_smmu_s1_cfg {
struct arm_smmu_ctx_desc cd;
u8 s1fmt;
u8 s1cdmax;
+ bool set;
};

struct arm_smmu_s2_cfg {
u16 vmid;
u64 vttbr;
u64 vtcr;
+ bool set;
};

struct arm_smmu_strtab_cfg {
@@ -678,10 +680,8 @@ struct arm_smmu_domain {
atomic_t nr_ats_masters;

enum arm_smmu_domain_stage stage;
- union {
- struct arm_smmu_s1_cfg s1_cfg;
- struct arm_smmu_s2_cfg s2_cfg;
- };
+ struct arm_smmu_s1_cfg s1_cfg;
+ struct arm_smmu_s2_cfg s2_cfg;

struct iommu_domain domain;

--
2.21.3

2020-11-18 11:26:22

by Eric Auger

[permalink] [raw]
Subject: [PATCH v13 11/15] iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI regions

Nested mode currently is not compatible with HW MSI reserved regions.
Indeed MSI transactions targeting this MSI doorbells bypass the SMMU.

Let's check nested mode is not attempted in such configuration.

Signed-off-by: Eric Auger <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 23 +++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 5a05c2074c8a..ccf2fef10b69 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2296,6 +2296,23 @@ static bool arm_smmu_share_msi_domain(struct iommu_domain *domain,
return share;
}

+static bool arm_smmu_has_hw_msi_resv_region(struct device *dev)
+{
+ struct iommu_resv_region *region;
+ bool has_msi_resv_region = false;
+ LIST_HEAD(resv_regions);
+
+ iommu_get_resv_regions(dev, &resv_regions);
+ list_for_each_entry(region, &resv_regions, list) {
+ if (region->type == IOMMU_RESV_MSI) {
+ has_msi_resv_region = true;
+ break;
+ }
+ }
+ iommu_put_resv_regions(dev, &resv_regions);
+ return has_msi_resv_region;
+}
+
static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
{
int ret = 0;
@@ -2350,10 +2367,12 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
/*
* In nested mode we must check all devices belonging to the
* domain share the same physical MSI doorbell. Otherwise nested
- * stage MSI binding is not supported.
+ * stage MSI binding is not supported. Also nested mode is not
+ * compatible with MSI HW reserved regions.
*/
if (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED &&
- !arm_smmu_share_msi_domain(domain, dev)) {
+ (!arm_smmu_share_msi_domain(domain, dev) ||
+ arm_smmu_has_hw_msi_resv_region(dev))) {
ret = -EINVAL;
goto out_unlock;
}
--
2.21.3

2020-11-18 11:26:24

by Eric Auger

[permalink] [raw]
Subject: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support

When nested stage translation is setup, both s1_cfg and
s2_cfg are set.

We introduce a new smmu domain abort field that will be set
upon guest stage1 configuration passing.

arm_smmu_write_strtab_ent() is modified to write both stage
fields in the STE and deal with the abort field.

In nested mode, only stage 2 is "finalized" as the host does
not own/configure the stage 1 context descriptor; guest does.

Signed-off-by: Eric Auger <[email protected]>

---
v10 -> v11:
- Fix an issue reported by Shameer when switching from with vSMMU
to without vSMMU. Despite the spec does not seem to mention it
seems to be needed to reset the 2 high 64b when switching from
S1+S2 cfg to S1 only. Especially dst[3] needs to be reset (S2TTB).
On some implementations, if the S2TTB is not reset, this causes
a C_BAD_STE error
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 64 +++++++++++++++++----
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +
2 files changed, 56 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 18ac5af1b284..412ea1bafa50 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1181,8 +1181,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
* three cases at the moment:
*
* 1. Invalid (all zero) -> bypass/fault (init)
- * 2. Bypass/fault -> translation/bypass (attach)
- * 3. Translation/bypass -> bypass/fault (detach)
+ * 2. Bypass/fault -> single stage translation/bypass (attach)
+ * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
+ * 4. S2 -> S1 + S2 (attach_pasid_table)
+ * 5. S1 + S2 -> S2 (detach_pasid_table)
*
* Given that we can't update the STE atomically and the SMMU
* doesn't read the thing in a defined order, that leaves us
@@ -1193,7 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
* 3. Update Config, sync
*/
u64 val = le64_to_cpu(dst[0]);
- bool ste_live = false;
+ bool s1_live = false, s2_live = false, ste_live;
+ bool abort, nested = false, translate = false;
struct arm_smmu_device *smmu = NULL;
struct arm_smmu_s1_cfg *s1_cfg;
struct arm_smmu_s2_cfg *s2_cfg;
@@ -1233,6 +1236,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
default:
break;
}
+ nested = s1_cfg->set && s2_cfg->set;
+ translate = s1_cfg->set || s2_cfg->set;
}

if (val & STRTAB_STE_0_V) {
@@ -1240,23 +1245,36 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
case STRTAB_STE_0_CFG_BYPASS:
break;
case STRTAB_STE_0_CFG_S1_TRANS:
+ s1_live = true;
+ break;
case STRTAB_STE_0_CFG_S2_TRANS:
- ste_live = true;
+ s2_live = true;
+ break;
+ case STRTAB_STE_0_CFG_NESTED:
+ s1_live = true;
+ s2_live = true;
break;
case STRTAB_STE_0_CFG_ABORT:
- BUG_ON(!disable_bypass);
break;
default:
BUG(); /* STE corruption */
}
}

+ ste_live = s1_live || s2_live;
+
/* Nuke the existing STE_0 value, as we're going to rewrite it */
val = STRTAB_STE_0_V;

/* Bypass/fault */
- if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
- if (!smmu_domain && disable_bypass)
+
+ if (!smmu_domain)
+ abort = disable_bypass;
+ else
+ abort = smmu_domain->abort;
+
+ if (abort || !translate) {
+ if (abort)
val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
else
val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
@@ -1274,8 +1292,16 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
return;
}

+ BUG_ON(ste_live && !nested);
+
+ if (ste_live) {
+ /* First invalidate the live STE */
+ dst[0] = cpu_to_le64(STRTAB_STE_0_CFG_ABORT);
+ arm_smmu_sync_ste_for_sid(smmu, sid);
+ }
+
if (s1_cfg->set) {
- BUG_ON(ste_live);
+ BUG_ON(s1_live);
dst[1] = cpu_to_le64(
FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
@@ -1294,7 +1320,14 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
}

if (s2_cfg->set) {
- BUG_ON(ste_live);
+ u64 vttbr = s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK;
+
+ if (s2_live) {
+ u64 s2ttb = le64_to_cpu(dst[3] & STRTAB_STE_3_S2TTB_MASK);
+
+ BUG_ON(s2ttb != vttbr);
+ }
+
dst[2] = cpu_to_le64(
FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
@@ -1304,9 +1337,12 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 |
STRTAB_STE_2_S2R);

- dst[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
+ dst[3] = cpu_to_le64(vttbr);

val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
+ } else {
+ dst[2] = 0;
+ dst[3] = 0;
}

if (master->ats_enabled)
@@ -1982,6 +2018,14 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
return 0;
}

+ if (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED &&
+ (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1) ||
+ !(smmu->features & ARM_SMMU_FEAT_TRANS_S2))) {
+ dev_info(smmu_domain->smmu->dev,
+ "does not implement two stages\n");
+ return -EINVAL;
+ }
+
/* Restrict the stage to what we can actually support */
if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 07f59252dd21..269779dee8d1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -206,6 +206,7 @@
#define STRTAB_STE_0_CFG_BYPASS 4
#define STRTAB_STE_0_CFG_S1_TRANS 5
#define STRTAB_STE_0_CFG_S2_TRANS 6
+#define STRTAB_STE_0_CFG_NESTED 7

#define STRTAB_STE_0_S1FMT GENMASK_ULL(5, 4)
#define STRTAB_STE_0_S1FMT_LINEAR 0
@@ -682,6 +683,7 @@ struct arm_smmu_domain {
enum arm_smmu_domain_stage stage;
struct arm_smmu_s1_cfg s1_cfg;
struct arm_smmu_s2_cfg s2_cfg;
+ bool abort;

struct iommu_domain domain;

--
2.21.3

2020-11-18 11:26:50

by Eric Auger

[permalink] [raw]
Subject: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure

From: Jean-Philippe Brucker <[email protected]>

When handling faults from the event or PRI queue, we need to find the
struct device associated to a SID. Add a rb_tree to keep track of SIDs.

Signed-off-by: Jean-Philippe Brucker <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 161 ++++++++++++++++----
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 13 +-
2 files changed, 144 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index e634bbe60573..1e4acc7f3d3c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -911,8 +911,8 @@ static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain,

spin_lock_irqsave(&smmu_domain->devices_lock, flags);
list_for_each_entry(master, &smmu_domain->devices, domain_head) {
- for (i = 0; i < master->num_sids; i++) {
- cmd.cfgi.sid = master->sids[i];
+ for (i = 0; i < master->num_streams; i++) {
+ cmd.cfgi.sid = master->streams[i].id;
arm_smmu_cmdq_batch_add(smmu, &cmds, &cmd);
}
}
@@ -1350,6 +1350,32 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
return 0;
}

+__maybe_unused
+static struct arm_smmu_master *
+arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
+{
+ struct rb_node *node;
+ struct arm_smmu_stream *stream;
+ struct arm_smmu_master *master = NULL;
+
+ mutex_lock(&smmu->streams_mutex);
+ node = smmu->streams.rb_node;
+ while (node) {
+ stream = rb_entry(node, struct arm_smmu_stream, node);
+ if (stream->id < sid) {
+ node = node->rb_right;
+ } else if (stream->id > sid) {
+ node = node->rb_left;
+ } else {
+ master = stream->master;
+ break;
+ }
+ }
+ mutex_unlock(&smmu->streams_mutex);
+
+ return master;
+}
+
/* IRQ and event handlers */
static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
{
@@ -1569,8 +1595,8 @@ static int arm_smmu_atc_inv_master(struct arm_smmu_master *master)

arm_smmu_atc_inv_to_cmd(0, 0, 0, &cmd);

- for (i = 0; i < master->num_sids; i++) {
- cmd.atc.sid = master->sids[i];
+ for (i = 0; i < master->num_streams; i++) {
+ cmd.atc.sid = master->streams[i].id;
arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
}

@@ -1613,8 +1639,8 @@ static int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain,
if (!master->ats_enabled)
continue;

- for (i = 0; i < master->num_sids; i++) {
- cmd.atc.sid = master->sids[i];
+ for (i = 0; i < master->num_streams; i++) {
+ cmd.atc.sid = master->streams[i].id;
arm_smmu_cmdq_batch_add(smmu_domain->smmu, &cmds, &cmd);
}
}
@@ -2027,13 +2053,13 @@ static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master)
int i, j;
struct arm_smmu_device *smmu = master->smmu;

- for (i = 0; i < master->num_sids; ++i) {
- u32 sid = master->sids[i];
+ for (i = 0; i < master->num_streams; ++i) {
+ u32 sid = master->streams[i].id;
__le64 *step = arm_smmu_get_step_for_sid(smmu, sid);

/* Bridged PCI devices may end up with duplicated IDs */
for (j = 0; j < i; j++)
- if (master->sids[j] == sid)
+ if (master->streams[j].id == sid)
break;
if (j < i)
continue;
@@ -2306,11 +2332,101 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
return sid < limit;
}

+static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
+ struct arm_smmu_master *master)
+{
+ int i;
+ int ret = 0;
+ struct arm_smmu_stream *new_stream, *cur_stream;
+ struct rb_node **new_node, *parent_node = NULL;
+ struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
+
+ master->streams = kcalloc(fwspec->num_ids,
+ sizeof(struct arm_smmu_stream), GFP_KERNEL);
+ if (!master->streams)
+ return -ENOMEM;
+ master->num_streams = fwspec->num_ids;
+
+ mutex_lock(&smmu->streams_mutex);
+ for (i = 0; i < fwspec->num_ids && !ret; i++) {
+ u32 sid = fwspec->ids[i];
+
+ new_stream = &master->streams[i];
+ new_stream->id = sid;
+ new_stream->master = master;
+
+ /*
+ * Check the SIDs are in range of the SMMU and our stream table
+ */
+ if (!arm_smmu_sid_in_range(smmu, sid)) {
+ ret = -ERANGE;
+ break;
+ }
+
+ /* Ensure l2 strtab is initialised */
+ if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
+ ret = arm_smmu_init_l2_strtab(smmu, sid);
+ if (ret)
+ break;
+ }
+
+ /* Insert into SID tree */
+ new_node = &(smmu->streams.rb_node);
+ while (*new_node) {
+ cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
+ node);
+ parent_node = *new_node;
+ if (cur_stream->id > new_stream->id) {
+ new_node = &((*new_node)->rb_left);
+ } else if (cur_stream->id < new_stream->id) {
+ new_node = &((*new_node)->rb_right);
+ } else {
+ dev_warn(master->dev,
+ "stream %u already in tree\n",
+ cur_stream->id);
+ ret = -EINVAL;
+ break;
+ }
+ }
+
+ if (!ret) {
+ rb_link_node(&new_stream->node, parent_node, new_node);
+ rb_insert_color(&new_stream->node, &smmu->streams);
+ }
+ }
+
+ if (ret) {
+ for (; i > 0; i--)
+ rb_erase(&master->streams[i].node, &smmu->streams);
+ kfree(master->streams);
+ }
+ mutex_unlock(&smmu->streams_mutex);
+
+ return ret;
+}
+
+static void arm_smmu_remove_master(struct arm_smmu_master *master)
+{
+ int i;
+ struct arm_smmu_device *smmu = master->smmu;
+ struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
+
+ if (!smmu || !master->streams)
+ return;
+
+ mutex_lock(&smmu->streams_mutex);
+ for (i = 0; i < fwspec->num_ids; i++)
+ rb_erase(&master->streams[i].node, &smmu->streams);
+ mutex_unlock(&smmu->streams_mutex);
+
+ kfree(master->streams);
+}
+
static struct iommu_ops arm_smmu_ops;

static struct iommu_device *arm_smmu_probe_device(struct device *dev)
{
- int i, ret;
+ int ret;
struct arm_smmu_device *smmu;
struct arm_smmu_master *master;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
@@ -2331,27 +2447,12 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)

master->dev = dev;
master->smmu = smmu;
- master->sids = fwspec->ids;
- master->num_sids = fwspec->num_ids;
INIT_LIST_HEAD(&master->bonds);
dev_iommu_priv_set(dev, master);

- /* Check the SIDs are in range of the SMMU and our stream table */
- for (i = 0; i < master->num_sids; i++) {
- u32 sid = master->sids[i];
-
- if (!arm_smmu_sid_in_range(smmu, sid)) {
- ret = -ERANGE;
- goto err_free_master;
- }
-
- /* Ensure l2 strtab is initialised */
- if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
- ret = arm_smmu_init_l2_strtab(smmu, sid);
- if (ret)
- goto err_free_master;
- }
- }
+ ret = arm_smmu_insert_master(smmu, master);
+ if (ret)
+ goto err_free_master;

master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);

@@ -2389,6 +2490,7 @@ static void arm_smmu_release_device(struct device *dev)
WARN_ON(arm_smmu_master_sva_enabled(master));
arm_smmu_detach_dev(master);
arm_smmu_disable_pasid(master);
+ arm_smmu_remove_master(master);
kfree(master);
iommu_fwspec_free(dev);
}
@@ -2808,6 +2910,9 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
{
int ret;

+ mutex_init(&smmu->streams_mutex);
+ smmu->streams = RB_ROOT;
+
ret = arm_smmu_init_queues(smmu);
if (ret)
return ret;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index d4b7f40ccb02..19196eea7c1d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -636,6 +636,15 @@ struct arm_smmu_device {

/* IOMMU core code handle */
struct iommu_device iommu;
+
+ struct rb_root streams;
+ struct mutex streams_mutex;
+};
+
+struct arm_smmu_stream {
+ u32 id;
+ struct arm_smmu_master *master;
+ struct rb_node node;
};

/* SMMU private data for each master */
@@ -644,8 +653,8 @@ struct arm_smmu_master {
struct device *dev;
struct arm_smmu_domain *domain;
struct list_head domain_head;
- u32 *sids;
- unsigned int num_sids;
+ struct arm_smmu_stream *streams;
+ unsigned int num_streams;
bool ats_enabled;
bool sva_enabled;
struct list_head bonds;
--
2.21.3

2020-11-18 11:27:16

by Eric Auger

[permalink] [raw]
Subject: [PATCH v13 14/15] iommu/smmuv3: Accept configs with more than one context descriptor

In preparation for vSVA, let's accept userspace provided configs
with more than one CD. We check the max CD against the host iommu
capability and also the format (linear versus 2 level).

Signed-off-by: Eric Auger <[email protected]>
Signed-off-by: Shameer Kolothum <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 977c22d08612..ed64699a4a0d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2905,14 +2905,17 @@ static int arm_smmu_attach_pasid_table(struct iommu_domain *domain,
if (smmu_domain->s1_cfg.set)
goto out;

- /*
- * we currently support a single CD so s1fmt and s1dss
- * fields are also ignored
- */
- if (cfg->pasid_bits)
+ list_for_each_entry(master, &smmu_domain->devices, domain_head) {
+ if (cfg->pasid_bits > master->ssid_bits)
+ goto out;
+ }
+ if (cfg->vendor_data.smmuv3.s1fmt == STRTAB_STE_0_S1FMT_64K_L2 &&
+ !(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
goto out;

smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
+ smmu_domain->s1_cfg.s1cdmax = cfg->pasid_bits;
+ smmu_domain->s1_cfg.s1fmt = cfg->vendor_data.smmuv3.s1fmt;
smmu_domain->s1_cfg.set = true;
smmu_domain->abort = false;
break;
--
2.21.3

2020-11-18 11:28:09

by Eric Auger

[permalink] [raw]
Subject: [PATCH v13 10/15] iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement

In nested mode we enforce the rule that all devices belonging
to the same iommu_domain share the same msi_domain.

Indeed if there were several physical MSI doorbells being used
within a single iommu_domain, it becomes really difficult to
resolve the nested stage mapping translating into the correct
physical doorbell. So let's forbid this situation.

Signed-off-by: Eric Auger <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 41 +++++++++++++++++++++
1 file changed, 41 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 24124361dd3b..5a05c2074c8a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2265,6 +2265,37 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
arm_smmu_install_ste_for_dev(master);
}

+static bool arm_smmu_share_msi_domain(struct iommu_domain *domain,
+ struct device *dev)
+{
+ struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+ struct irq_domain *irqd = dev_get_msi_domain(dev);
+ struct arm_smmu_master *master;
+ unsigned long flags;
+ bool share = false;
+
+ if (!irqd)
+ return true;
+
+ spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+ list_for_each_entry(master, &smmu_domain->devices, domain_head) {
+ struct irq_domain *d = dev_get_msi_domain(master->dev);
+
+ if (!d)
+ continue;
+ if (irqd != d) {
+ dev_info(dev, "Nested mode forbids to attach devices "
+ "using different physical MSI doorbells "
+ "to the same iommu_domain");
+ goto unlock;
+ }
+ }
+ share = true;
+unlock:
+ spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+ return share;
+}
+
static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
{
int ret = 0;
@@ -2316,6 +2347,16 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
ret = -EINVAL;
goto out_unlock;
}
+ /*
+ * In nested mode we must check all devices belonging to the
+ * domain share the same physical MSI doorbell. Otherwise nested
+ * stage MSI binding is not supported.
+ */
+ if (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED &&
+ !arm_smmu_share_msi_domain(domain, dev)) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }

master->domain = smmu_domain;

--
2.21.3

2020-11-18 11:28:15

by Eric Auger

[permalink] [raw]
Subject: [PATCH v13 12/15] iommu/smmuv3: Implement bind/unbind_guest_msi

The bind/unbind_guest_msi() callbacks check the domain
is NESTED and redirect to the dma-iommu implementation.

Signed-off-by: Eric Auger <[email protected]>

---

v6 -> v7:
- remove device handle argument
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 43 +++++++++++++++++++++
1 file changed, 43 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index ccf2fef10b69..1b8dad340899 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2744,6 +2744,47 @@ static void arm_smmu_get_resv_regions(struct device *dev,
iommu_dma_get_resv_regions(dev, head);
}

+static int
+arm_smmu_bind_guest_msi(struct iommu_domain *domain,
+ dma_addr_t giova, phys_addr_t gpa, size_t size)
+{
+ struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+ struct arm_smmu_device *smmu;
+ int ret = -EINVAL;
+
+ mutex_lock(&smmu_domain->init_mutex);
+ smmu = smmu_domain->smmu;
+ if (!smmu)
+ goto out;
+
+ if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+ goto out;
+
+ ret = iommu_dma_bind_guest_msi(domain, giova, gpa, size);
+out:
+ mutex_unlock(&smmu_domain->init_mutex);
+ return ret;
+}
+
+static void
+arm_smmu_unbind_guest_msi(struct iommu_domain *domain, dma_addr_t giova)
+{
+ struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+ struct arm_smmu_device *smmu;
+
+ mutex_lock(&smmu_domain->init_mutex);
+ smmu = smmu_domain->smmu;
+ if (!smmu)
+ goto unlock;
+
+ if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+ goto unlock;
+
+ iommu_dma_unbind_guest_msi(domain, giova);
+unlock:
+ mutex_unlock(&smmu_domain->init_mutex);
+}
+
static int arm_smmu_attach_pasid_table(struct iommu_domain *domain,
struct iommu_pasid_table_config *cfg)
{
@@ -2967,6 +3008,8 @@ static struct iommu_ops arm_smmu_ops = {
.attach_pasid_table = arm_smmu_attach_pasid_table,
.detach_pasid_table = arm_smmu_detach_pasid_table,
.cache_invalidate = arm_smmu_cache_invalidate,
+ .bind_guest_msi = arm_smmu_bind_guest_msi,
+ .unbind_guest_msi = arm_smmu_unbind_guest_msi,
.dev_has_feat = arm_smmu_dev_has_feature,
.dev_feat_enabled = arm_smmu_dev_feature_enabled,
.dev_enable_feat = arm_smmu_dev_enable_feature,
--
2.21.3

2020-11-18 11:28:46

by Eric Auger

[permalink] [raw]
Subject: [PATCH v13 15/15] iommu/smmuv3: Add PASID cache invalidation per PASID

In order to cascade guest CFGI_CD, let's add PASID cache invalidation
per PASID.

Signed-off-by: Eric Auger <[email protected]>

---

v12 -> v13:
- Fix !(info->flags & IOMMU_INV_PASID_FLAGS_PASID) check
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index ed64699a4a0d..45adfe4da11b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2999,9 +2999,19 @@ arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
} else {
return -EINVAL;
}
- }
- if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
- inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
+ } else if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID) {
+ if (inv_info->granularity == IOMMU_INV_GRANU_PASID) {
+ struct iommu_inv_pasid_info *info =
+ &inv_info->granu.pasid_info;
+
+ if (!(info->flags & IOMMU_INV_PASID_FLAGS_PASID))
+ return -EINVAL;
+
+ arm_smmu_sync_cd(smmu_domain, info->pasid, true);
+ } else {
+ return -ENOENT;
+ }
+ } else { /* IOMMU_CACHE_INV_TYPE_DEV_IOTLB */
return -ENOENT;
}
return 0;
--
2.21.3

2020-11-18 11:28:49

by Eric Auger

[permalink] [raw]
Subject: [PATCH v13 13/15] iommu/smmuv3: Report non recoverable faults

When a stage 1 related fault event is read from the event queue,
let's propagate it to potential external fault listeners, ie. users
who registered a fault handler.

Signed-off-by: Eric Auger <[email protected]>

---
v8 -> v9:
- adapt to the removal of IOMMU_FAULT_UNRECOV_PERM_VALID:
only look at IOMMU_FAULT_UNRECOV_ADDR_VALID which comes with
perm
- do not advertise IOMMU_FAULT_UNRECOV_PASID_VALID faults for
translation faults
- trace errors if !master
- test nested before calling iommu_report_device_fault
- call the fault handler unconditionnally in non nested mode

v4 -> v5:
- s/IOMMU_FAULT_PERM_INST/IOMMU_FAULT_PERM_EXEC
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 102 +++++++++++++++++---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 80 +++++++++++++++
2 files changed, 171 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 1b8dad340899..977c22d08612 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1397,7 +1397,6 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
return 0;
}

-__maybe_unused
static struct arm_smmu_master *
arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
{
@@ -1423,25 +1422,106 @@ arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
return master;
}

+/* Populates the record fields according to the input SMMU event */
+static bool arm_smmu_transcode_fault(u64 *evt, u8 type,
+ struct iommu_fault_unrecoverable *record)
+{
+ const struct arm_smmu_fault_propagation_data *data;
+ u32 fields;
+
+ if (type >= ARRAY_SIZE(fault_propagation))
+ return false;
+
+ data = &fault_propagation[type];
+ if (!data->reason)
+ return false;
+
+ fields = data->fields;
+
+ if (data->s1_check & FIELD_GET(EVTQ_1_S2, evt[1]))
+ return false; /* S2 related fault, don't propagate */
+
+ if (fields & IOMMU_FAULT_UNRECOV_PASID_VALID)
+ record->pasid = FIELD_GET(EVTQ_0_SUBSTREAMID, evt[0]);
+ else {
+ /* all other transcoded errors have SSV */
+ if (FIELD_GET(EVTQ_0_SSV, evt[0])) {
+ record->pasid = FIELD_GET(EVTQ_0_SUBSTREAMID, evt[0]);
+ fields |= IOMMU_FAULT_UNRECOV_PASID_VALID;
+ }
+ }
+
+ if (fields & IOMMU_FAULT_UNRECOV_ADDR_VALID) {
+ if (FIELD_GET(EVTQ_1_RNW, evt[1]))
+ record->perm = IOMMU_FAULT_PERM_READ;
+ else
+ record->perm = IOMMU_FAULT_PERM_WRITE;
+ if (FIELD_GET(EVTQ_1_PNU, evt[1]))
+ record->perm |= IOMMU_FAULT_PERM_PRIV;
+ if (FIELD_GET(EVTQ_1_IND, evt[1]))
+ record->perm |= IOMMU_FAULT_PERM_EXEC;
+ record->addr = evt[2];
+ }
+
+ if (fields & IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID)
+ record->fetch_addr = FIELD_GET(EVTQ_3_FETCH_ADDR, evt[3]);
+
+ record->flags = fields;
+ record->reason = data->reason;
+ return true;
+}
+
+static void arm_smmu_report_event(struct arm_smmu_device *smmu, u64 *evt)
+{
+ u32 sid = FIELD_GET(EVTQ_0_STREAMID, evt[0]);
+ u8 type = FIELD_GET(EVTQ_0_ID, evt[0]);
+ struct arm_smmu_master *master;
+ struct iommu_fault_event event = {};
+ bool nested;
+ int i;
+
+ master = arm_smmu_find_master(smmu, sid);
+ if (!master || !master->domain)
+ goto out;
+
+ event.fault.type = IOMMU_FAULT_DMA_UNRECOV;
+
+ nested = (master->domain->stage == ARM_SMMU_DOMAIN_NESTED);
+
+ if (nested) {
+ if (arm_smmu_transcode_fault(evt, type, &event.fault.event)) {
+ /*
+ * Only S1 related faults should be reported to the
+ * guest and must not flood the host log.
+ * Also a fault handler should have been registered
+ * to guarantee the full nested functionality
+ */
+ WARN_ON_ONCE(iommu_report_device_fault(master->dev,
+ &event));
+ return;
+ }
+ } else {
+ iommu_report_device_fault(master->dev, &event);
+ }
+out:
+ dev_info(smmu->dev, "event 0x%02x received:\n", type);
+ for (i = 0; i < EVTQ_ENT_DWORDS; ++i) {
+ dev_info(smmu->dev, "\t0x%016llx\n",
+ (unsigned long long)evt[i]);
+ }
+}
+
/* IRQ and event handlers */
static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
{
- int i;
struct arm_smmu_device *smmu = dev;
struct arm_smmu_queue *q = &smmu->evtq.q;
struct arm_smmu_ll_queue *llq = &q->llq;
u64 evt[EVTQ_ENT_DWORDS];

do {
- while (!queue_remove_raw(q, evt)) {
- u8 id = FIELD_GET(EVTQ_0_ID, evt[0]);
-
- dev_info(smmu->dev, "event 0x%02x received:\n", id);
- for (i = 0; i < ARRAY_SIZE(evt); ++i)
- dev_info(smmu->dev, "\t0x%016llx\n",
- (unsigned long long)evt[i]);
-
- }
+ while (!queue_remove_raw(q, evt))
+ arm_smmu_report_event(smmu, evt);

/*
* Not much we can do on overflow, so scream and pretend we're
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 269779dee8d1..452b094afac7 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -152,6 +152,26 @@
#define ARM_SMMU_PRIQ_IRQ_CFG1 0xd8
#define ARM_SMMU_PRIQ_IRQ_CFG2 0xdc

+/* Events */
+#define ARM_SMMU_EVT_F_UUT 0x01
+#define ARM_SMMU_EVT_C_BAD_STREAMID 0x02
+#define ARM_SMMU_EVT_F_STE_FETCH 0x03
+#define ARM_SMMU_EVT_C_BAD_STE 0x04
+#define ARM_SMMU_EVT_F_BAD_ATS_TREQ 0x05
+#define ARM_SMMU_EVT_F_STREAM_DISABLED 0x06
+#define ARM_SMMU_EVT_F_TRANSL_FORBIDDEN 0x07
+#define ARM_SMMU_EVT_C_BAD_SUBSTREAMID 0x08
+#define ARM_SMMU_EVT_F_CD_FETCH 0x09
+#define ARM_SMMU_EVT_C_BAD_CD 0x0a
+#define ARM_SMMU_EVT_F_WALK_EABT 0x0b
+#define ARM_SMMU_EVT_F_TRANSLATION 0x10
+#define ARM_SMMU_EVT_F_ADDR_SIZE 0x11
+#define ARM_SMMU_EVT_F_ACCESS 0x12
+#define ARM_SMMU_EVT_F_PERMISSION 0x13
+#define ARM_SMMU_EVT_F_TLB_CONFLICT 0x20
+#define ARM_SMMU_EVT_F_CFG_CONFLICT 0x21
+#define ARM_SMMU_EVT_E_PAGE_REQUEST 0x24
+
#define ARM_SMMU_REG_SZ 0xe00

/* Common MSI config fields */
@@ -370,6 +390,15 @@
#define EVTQ_MAX_SZ_SHIFT (Q_MAX_SZ_SHIFT - EVTQ_ENT_SZ_SHIFT)

#define EVTQ_0_ID GENMASK_ULL(7, 0)
+#define EVTQ_0_SSV GENMASK_ULL(11, 11)
+#define EVTQ_0_SUBSTREAMID GENMASK_ULL(31, 12)
+#define EVTQ_0_STREAMID GENMASK_ULL(63, 32)
+#define EVTQ_1_PNU GENMASK_ULL(33, 33)
+#define EVTQ_1_IND GENMASK_ULL(34, 34)
+#define EVTQ_1_RNW GENMASK_ULL(35, 35)
+#define EVTQ_1_S2 GENMASK_ULL(39, 39)
+#define EVTQ_1_CLASS GENMASK_ULL(40, 41)
+#define EVTQ_3_FETCH_ADDR GENMASK_ULL(51, 3)

/* PRI queue */
#define PRIQ_ENT_SZ_SHIFT 4
@@ -691,6 +720,57 @@ struct arm_smmu_domain {
spinlock_t devices_lock;
};

+/* fault propagation */
+struct arm_smmu_fault_propagation_data {
+ enum iommu_fault_reason reason;
+ bool s1_check;
+ u32 fields; /* IOMMU_FAULT_UNRECOV_*_VALID bits */
+};
+
+/*
+ * Describes how SMMU faults translate into generic IOMMU faults
+ * and if they need to be reported externally
+ */
+static const struct arm_smmu_fault_propagation_data fault_propagation[] = {
+[ARM_SMMU_EVT_F_UUT] = { },
+[ARM_SMMU_EVT_C_BAD_STREAMID] = { },
+[ARM_SMMU_EVT_F_STE_FETCH] = { },
+[ARM_SMMU_EVT_C_BAD_STE] = { },
+[ARM_SMMU_EVT_F_BAD_ATS_TREQ] = { },
+[ARM_SMMU_EVT_F_STREAM_DISABLED] = { },
+[ARM_SMMU_EVT_F_TRANSL_FORBIDDEN] = { },
+[ARM_SMMU_EVT_C_BAD_SUBSTREAMID] = {IOMMU_FAULT_REASON_PASID_INVALID,
+ false,
+ IOMMU_FAULT_UNRECOV_PASID_VALID
+ },
+[ARM_SMMU_EVT_F_CD_FETCH] = {IOMMU_FAULT_REASON_PASID_FETCH,
+ false,
+ IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID
+ },
+[ARM_SMMU_EVT_C_BAD_CD] = {IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
+ false,
+ },
+[ARM_SMMU_EVT_F_WALK_EABT] = {IOMMU_FAULT_REASON_WALK_EABT, true,
+ IOMMU_FAULT_UNRECOV_ADDR_VALID |
+ IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID
+ },
+[ARM_SMMU_EVT_F_TRANSLATION] = {IOMMU_FAULT_REASON_PTE_FETCH, true,
+ IOMMU_FAULT_UNRECOV_ADDR_VALID
+ },
+[ARM_SMMU_EVT_F_ADDR_SIZE] = {IOMMU_FAULT_REASON_OOR_ADDRESS, true,
+ IOMMU_FAULT_UNRECOV_ADDR_VALID
+ },
+[ARM_SMMU_EVT_F_ACCESS] = {IOMMU_FAULT_REASON_ACCESS, true,
+ IOMMU_FAULT_UNRECOV_ADDR_VALID
+ },
+[ARM_SMMU_EVT_F_PERMISSION] = {IOMMU_FAULT_REASON_PERMISSION, true,
+ IOMMU_FAULT_UNRECOV_ADDR_VALID
+ },
+[ARM_SMMU_EVT_F_TLB_CONFLICT] = { },
+[ARM_SMMU_EVT_F_CFG_CONFLICT] = { },
+[ARM_SMMU_EVT_E_PAGE_REQUEST] = { },
+};
+
extern struct xarray arm_smmu_asid_xa;
extern struct mutex arm_smmu_asid_lock;

--
2.21.3

2020-11-19 04:03:50

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support

Hi Eric,

I love your patch! Perhaps something to improve:

[auto build test WARNING on iommu/next]
[also build test WARNING on linus/master v5.10-rc4 next-20201118]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Eric-Auger/SMMUv3-Nested-Stage-Setup-IOMMU-part/20201118-192520
base: https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next
config: arm64-randconfig-s031-20201118 (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.3-123-g626c4742-dirty
# https://github.com/0day-ci/linux/commit/7308cdb07384d807c5ef43e6bfe0cd61c35a121e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Eric-Auger/SMMUv3-Nested-Stage-Setup-IOMMU-part/20201118-192520
git checkout 7308cdb07384d807c5ef43e6bfe0cd61c35a121e
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=arm64

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>


"sparse warnings: (new ones prefixed by >>)"
>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:1326:37: sparse: sparse: restricted __le64 degrades to integer
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:1326:37: sparse: sparse: cast to restricted __le64
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c: note: in included file (through arch/arm64/include/asm/atomic.h, include/linux/atomic.h, include/asm-generic/bitops/atomic.h, ...):
arch/arm64/include/asm/cmpxchg.h:172:1: sparse: sparse: cast truncates bits from constant value (ffffffff80000000 becomes 0)
arch/arm64/include/asm/cmpxchg.h:172:1: sparse: sparse: cast truncates bits from constant value (ffffffff80000000 becomes 0)

vim +1326 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c

1175
1176 static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
1177 __le64 *dst)
1178 {
1179 /*
1180 * This is hideously complicated, but we only really care about
1181 * three cases at the moment:
1182 *
1183 * 1. Invalid (all zero) -> bypass/fault (init)
1184 * 2. Bypass/fault -> single stage translation/bypass (attach)
1185 * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
1186 * 4. S2 -> S1 + S2 (attach_pasid_table)
1187 * 5. S1 + S2 -> S2 (detach_pasid_table)
1188 *
1189 * Given that we can't update the STE atomically and the SMMU
1190 * doesn't read the thing in a defined order, that leaves us
1191 * with the following maintenance requirements:
1192 *
1193 * 1. Update Config, return (init time STEs aren't live)
1194 * 2. Write everything apart from dword 0, sync, write dword 0, sync
1195 * 3. Update Config, sync
1196 */
1197 u64 val = le64_to_cpu(dst[0]);
1198 bool s1_live = false, s2_live = false, ste_live;
1199 bool abort, nested = false, translate = false;
1200 struct arm_smmu_device *smmu = NULL;
1201 struct arm_smmu_s1_cfg *s1_cfg;
1202 struct arm_smmu_s2_cfg *s2_cfg;
1203 struct arm_smmu_domain *smmu_domain = NULL;
1204 struct arm_smmu_cmdq_ent prefetch_cmd = {
1205 .opcode = CMDQ_OP_PREFETCH_CFG,
1206 .prefetch = {
1207 .sid = sid,
1208 },
1209 };
1210
1211 if (master) {
1212 smmu_domain = master->domain;
1213 smmu = master->smmu;
1214 }
1215
1216 if (smmu_domain) {
1217 s1_cfg = &smmu_domain->s1_cfg;
1218 s2_cfg = &smmu_domain->s2_cfg;
1219
1220 switch (smmu_domain->stage) {
1221 case ARM_SMMU_DOMAIN_S1:
1222 s1_cfg->set = true;
1223 s2_cfg->set = false;
1224 break;
1225 case ARM_SMMU_DOMAIN_S2:
1226 s1_cfg->set = false;
1227 s2_cfg->set = true;
1228 break;
1229 case ARM_SMMU_DOMAIN_NESTED:
1230 /*
1231 * Actual usage of stage 1 depends on nested mode:
1232 * legacy (2d stage only) or true nested mode
1233 */
1234 s2_cfg->set = true;
1235 break;
1236 default:
1237 break;
1238 }
1239 nested = s1_cfg->set && s2_cfg->set;
1240 translate = s1_cfg->set || s2_cfg->set;
1241 }
1242
1243 if (val & STRTAB_STE_0_V) {
1244 switch (FIELD_GET(STRTAB_STE_0_CFG, val)) {
1245 case STRTAB_STE_0_CFG_BYPASS:
1246 break;
1247 case STRTAB_STE_0_CFG_S1_TRANS:
1248 s1_live = true;
1249 break;
1250 case STRTAB_STE_0_CFG_S2_TRANS:
1251 s2_live = true;
1252 break;
1253 case STRTAB_STE_0_CFG_NESTED:
1254 s1_live = true;
1255 s2_live = true;
1256 break;
1257 case STRTAB_STE_0_CFG_ABORT:
1258 break;
1259 default:
1260 BUG(); /* STE corruption */
1261 }
1262 }
1263
1264 ste_live = s1_live || s2_live;
1265
1266 /* Nuke the existing STE_0 value, as we're going to rewrite it */
1267 val = STRTAB_STE_0_V;
1268
1269 /* Bypass/fault */
1270
1271 if (!smmu_domain)
1272 abort = disable_bypass;
1273 else
1274 abort = smmu_domain->abort;
1275
1276 if (abort || !translate) {
1277 if (abort)
1278 val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
1279 else
1280 val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
1281
1282 dst[0] = cpu_to_le64(val);
1283 dst[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG,
1284 STRTAB_STE_1_SHCFG_INCOMING));
1285 dst[2] = 0; /* Nuke the VMID */
1286 /*
1287 * The SMMU can perform negative caching, so we must sync
1288 * the STE regardless of whether the old value was live.
1289 */
1290 if (smmu)
1291 arm_smmu_sync_ste_for_sid(smmu, sid);
1292 return;
1293 }
1294
1295 BUG_ON(ste_live && !nested);
1296
1297 if (ste_live) {
1298 /* First invalidate the live STE */
1299 dst[0] = cpu_to_le64(STRTAB_STE_0_CFG_ABORT);
1300 arm_smmu_sync_ste_for_sid(smmu, sid);
1301 }
1302
1303 if (s1_cfg->set) {
1304 BUG_ON(s1_live);
1305 dst[1] = cpu_to_le64(
1306 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
1307 FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
1308 FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) |
1309 FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) |
1310 FIELD_PREP(STRTAB_STE_1_STRW, STRTAB_STE_1_STRW_NSEL1));
1311
1312 if (smmu->features & ARM_SMMU_FEAT_STALLS &&
1313 !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
1314 dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
1315
1316 val |= (s1_cfg->cdcfg.cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
1317 FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) |
1318 FIELD_PREP(STRTAB_STE_0_S1CDMAX, s1_cfg->s1cdmax) |
1319 FIELD_PREP(STRTAB_STE_0_S1FMT, s1_cfg->s1fmt);
1320 }
1321
1322 if (s2_cfg->set) {
1323 u64 vttbr = s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK;
1324
1325 if (s2_live) {
> 1326 u64 s2ttb = le64_to_cpu(dst[3] & STRTAB_STE_3_S2TTB_MASK);
1327
1328 BUG_ON(s2ttb != vttbr);
1329 }
1330
1331 dst[2] = cpu_to_le64(
1332 FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
1333 FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
1334 #ifdef __BIG_ENDIAN
1335 STRTAB_STE_2_S2ENDI |
1336 #endif
1337 STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 |
1338 STRTAB_STE_2_S2R);
1339
1340 dst[3] = cpu_to_le64(vttbr);
1341
1342 val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
1343 } else {
1344 dst[2] = 0;
1345 dst[3] = 0;
1346 }
1347
1348 if (master->ats_enabled)
1349 dst[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_EATS,
1350 STRTAB_STE_1_EATS_TRANS));
1351
1352 arm_smmu_sync_ste_for_sid(smmu, sid);
1353 /* See comment in arm_smmu_write_ctx_desc() */
1354 WRITE_ONCE(dst[0], cpu_to_le64(val));
1355 arm_smmu_sync_ste_for_sid(smmu, sid);
1356
1357 /* It's likely that we'll want to use the new STE soon */
1358 if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH))
1359 arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd);
1360 }
1361

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (8.69 kB)
.config.gz (28.36 kB)
Download all attachments

2020-12-03 13:06:07

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support

Hi Kunkun,

On 12/3/20 1:32 PM, Kunkun Jiang wrote:
> Hi Eric,
>
> On 2020/11/18 19:21, Eric Auger wrote:
>> When nested stage translation is setup, both s1_cfg and
>> s2_cfg are set.
>>
>> We introduce a new smmu domain abort field that will be set
>> upon guest stage1 configuration passing.
>>
>> arm_smmu_write_strtab_ent() is modified to write both stage
>> fields in the STE and deal with the abort field.
>>
>> In nested mode, only stage 2 is "finalized" as the host does
>> not own/configure the stage 1 context descriptor; guest does.
>>
>> Signed-off-by: Eric Auger <[email protected]>
>>
>> ---
>> v10 -> v11:
>> - Fix an issue reported by Shameer when switching from with vSMMU
>> to without vSMMU. Despite the spec does not seem to mention it
>> seems to be needed to reset the 2 high 64b when switching from
>> S1+S2 cfg to S1 only. Especially dst[3] needs to be reset (S2TTB).
>> On some implementations, if the S2TTB is not reset, this causes
>> a C_BAD_STE error
>> ---
>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 64 +++++++++++++++++----
>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +
>> 2 files changed, 56 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 18ac5af1b284..412ea1bafa50 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -1181,8 +1181,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>> * three cases at the moment:
> Now, it should be *five cases*.
>> *
>> * 1. Invalid (all zero) -> bypass/fault (init)
>> - * 2. Bypass/fault -> translation/bypass (attach)
>> - * 3. Translation/bypass -> bypass/fault (detach)
>> + * 2. Bypass/fault -> single stage translation/bypass (attach)
>> + * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
>> + * 4. S2 -> S1 + S2 (attach_pasid_table)
>
> I was testing this series on one of our hardware board with SMMUv3. And
> I found while trying to /"//attach_pasid_table//"/,
>
> the sequence of STE (host) config(bit[3:1]) is /"S2->abort->S1 + S2"/.
> Because the maintenance is  /"Write everything apart///
>
> /from dword 0, sync, write dword 0, sync"/ when we update the STE
> (guest). Dose the sequence meet your expectation?

yes it does. I will fix the comments accordingly.

Is there anything to correct in the code or was it functional?

Thanks

Eric
>
>> + * 5. S1 + S2 -> S2 (detach_pasid_table)
>> *
>> * Given that we can't update the STE atomically and the SMMU
>> * doesn't read the thing in a defined order, that leaves us
>> @@ -1193,7 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>> * 3. Update Config, sync
>> */
>> u64 val = le64_to_cpu(dst[0]);
>> - bool ste_live = false;
>> + bool s1_live = false, s2_live = false, ste_live;
>> + bool abort, nested = false, translate = false;
>> struct arm_smmu_device *smmu = NULL;
>> struct arm_smmu_s1_cfg *s1_cfg;
>> struct arm_smmu_s2_cfg *s2_cfg;
>> @@ -1233,6 +1236,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>> default:
>> break;
>> }
>> + nested = s1_cfg->set && s2_cfg->set;
>> + translate = s1_cfg->set || s2_cfg->set;
>> }
>>
>> if (val & STRTAB_STE_0_V) {
>> @@ -1240,23 +1245,36 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>> case STRTAB_STE_0_CFG_BYPASS:
>> break;
>> case STRTAB_STE_0_CFG_S1_TRANS:
>> + s1_live = true;
>> + break;
>> case STRTAB_STE_0_CFG_S2_TRANS:
>> - ste_live = true;
>> + s2_live = true;
>> + break;
>> + case STRTAB_STE_0_CFG_NESTED:
>> + s1_live = true;
>> + s2_live = true;
>> break;
>> case STRTAB_STE_0_CFG_ABORT:
>> - BUG_ON(!disable_bypass);
>> break;
>> default:
>> BUG(); /* STE corruption */
>> }
>> }
>>
>> + ste_live = s1_live || s2_live;
>> +
>> /* Nuke the existing STE_0 value, as we're going to rewrite it */
>> val = STRTAB_STE_0_V;
>>
>> /* Bypass/fault */
>> - if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
>> - if (!smmu_domain && disable_bypass)
>> +
>> + if (!smmu_domain)
>> + abort = disable_bypass;
>> + else
>> + abort = smmu_domain->abort;
>> +
>> + if (abort || !translate) {
>> + if (abort)
>> val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
>> else
>> val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
>> @@ -1274,8 +1292,16 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>> return;
>> }
>>
>> + BUG_ON(ste_live && !nested);
>> +
>> + if (ste_live) {
>> + /* First invalidate the live STE */
>> + dst[0] = cpu_to_le64(STRTAB_STE_0_CFG_ABORT);
>> + arm_smmu_sync_ste_for_sid(smmu, sid);
>> + }
>> +
>> if (s1_cfg->set) {
>> - BUG_ON(ste_live);
>> + BUG_ON(s1_live);
>> dst[1] = cpu_to_le64(
>> FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
>> FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
>> @@ -1294,7 +1320,14 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>> }
>>
>> if (s2_cfg->set) {
>> - BUG_ON(ste_live);
>> + u64 vttbr = s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK;
>> +
>> + if (s2_live) {
>> + u64 s2ttb = le64_to_cpu(dst[3] & STRTAB_STE_3_S2TTB_MASK);
>> +
>> + BUG_ON(s2ttb != vttbr);
>> + }
>> +
>> dst[2] = cpu_to_le64(
>> FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
>> FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
>> @@ -1304,9 +1337,12 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>> STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 |
>> STRTAB_STE_2_S2R);
>>
>> - dst[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
>> + dst[3] = cpu_to_le64(vttbr);
>>
>> val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
>> + } else {
>> + dst[2] = 0;
>> + dst[3] = 0;
>> }
>>
>> if (master->ats_enabled)
>> @@ -1982,6 +2018,14 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
>> return 0;
>> }
>>
>> + if (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED &&
>> + (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1) ||
>> + !(smmu->features & ARM_SMMU_FEAT_TRANS_S2))) {
>> + dev_info(smmu_domain->smmu->dev,
>> + "does not implement two stages\n");
>> + return -EINVAL;
>> + }
>> +
>> /* Restrict the stage to what we can actually support */
>> if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
>> smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> index 07f59252dd21..269779dee8d1 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> @@ -206,6 +206,7 @@
>> #define STRTAB_STE_0_CFG_BYPASS 4
>> #define STRTAB_STE_0_CFG_S1_TRANS 5
>> #define STRTAB_STE_0_CFG_S2_TRANS 6
>> +#define STRTAB_STE_0_CFG_NESTED 7
>>
>> #define STRTAB_STE_0_S1FMT GENMASK_ULL(5, 4)
>> #define STRTAB_STE_0_S1FMT_LINEAR 0
>> @@ -682,6 +683,7 @@ struct arm_smmu_domain {
>> enum arm_smmu_domain_stage stage;
>> struct arm_smmu_s1_cfg s1_cfg;
>> struct arm_smmu_s2_cfg s2_cfg;
>> + bool abort;
>>
>> struct iommu_domain domain;
>
> Thanks,
>
> Kunkun Jiang
>

2020-12-03 13:28:16

by Kunkun Jiang

[permalink] [raw]
Subject: Re: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support


On 2020/12/3 21:01, Auger Eric wrote:
> Hi Kunkun,
>
> On 12/3/20 1:32 PM, Kunkun Jiang wrote:
>> Hi Eric,
>>
>> On 2020/11/18 19:21, Eric Auger wrote:
>>> When nested stage translation is setup, both s1_cfg and
>>> s2_cfg are set.
>>>
>>> We introduce a new smmu domain abort field that will be set
>>> upon guest stage1 configuration passing.
>>>
>>> arm_smmu_write_strtab_ent() is modified to write both stage
>>> fields in the STE and deal with the abort field.
>>>
>>> In nested mode, only stage 2 is "finalized" as the host does
>>> not own/configure the stage 1 context descriptor; guest does.
>>>
>>> Signed-off-by: Eric Auger <[email protected]>
>>>
>>> ---
>>> v10 -> v11:
>>> - Fix an issue reported by Shameer when switching from with vSMMU
>>> to without vSMMU. Despite the spec does not seem to mention it
>>> seems to be needed to reset the 2 high 64b when switching from
>>> S1+S2 cfg to S1 only. Especially dst[3] needs to be reset (S2TTB).
>>> On some implementations, if the S2TTB is not reset, this causes
>>> a C_BAD_STE error
>>> ---
>>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 64 +++++++++++++++++----
>>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +
>>> 2 files changed, 56 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> index 18ac5af1b284..412ea1bafa50 100644
>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> @@ -1181,8 +1181,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>> * three cases at the moment:
>> Now, it should be *five cases*.
>>> *
>>> * 1. Invalid (all zero) -> bypass/fault (init)
>>> - * 2. Bypass/fault -> translation/bypass (attach)
>>> - * 3. Translation/bypass -> bypass/fault (detach)
>>> + * 2. Bypass/fault -> single stage translation/bypass (attach)
>>> + * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
>>> + * 4. S2 -> S1 + S2 (attach_pasid_table)
>> I was testing this series on one of our hardware board with SMMUv3. And
>> I found while trying to /"//attach_pasid_table//"/,
>>
>> the sequence of STE (host) config(bit[3:1]) is /"S2->abort->S1 + S2"/.
>> Because the maintenance is  /"Write everything apart///
>>
>> /from dword 0, sync, write dword 0, sync"/ when we update the STE
>> (guest). Dose the sequence meet your expectation?
> yes it does. I will fix the comments accordingly.
>
> Is there anything to correct in the code or was it functional?
>
> Thanks
>
> Eric
No, thanks. I just want to clarify the sequence.   :)
>>> + * 5. S1 + S2 -> S2 (detach_pasid_table)
>>> *
>>> * Given that we can't update the STE atomically and the SMMU
>>> * doesn't read the thing in a defined order, that leaves us
>>> @@ -1193,7 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>> * 3. Update Config, sync
>>> */
>>> u64 val = le64_to_cpu(dst[0]);
>>> - bool ste_live = false;
>>> + bool s1_live = false, s2_live = false, ste_live;
>>> + bool abort, nested = false, translate = false;
>>> struct arm_smmu_device *smmu = NULL;
>>> struct arm_smmu_s1_cfg *s1_cfg;
>>> struct arm_smmu_s2_cfg *s2_cfg;
>>> @@ -1233,6 +1236,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>> default:
>>> break;
>>> }
>>> + nested = s1_cfg->set && s2_cfg->set;
>>> + translate = s1_cfg->set || s2_cfg->set;
>>> }
>>>
>>> if (val & STRTAB_STE_0_V) {
>>> @@ -1240,23 +1245,36 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>> case STRTAB_STE_0_CFG_BYPASS:
>>> break;
>>> case STRTAB_STE_0_CFG_S1_TRANS:
>>> + s1_live = true;
>>> + break;
>>> case STRTAB_STE_0_CFG_S2_TRANS:
>>> - ste_live = true;
>>> + s2_live = true;
>>> + break;
>>> + case STRTAB_STE_0_CFG_NESTED:
>>> + s1_live = true;
>>> + s2_live = true;
>>> break;
>>> case STRTAB_STE_0_CFG_ABORT:
>>> - BUG_ON(!disable_bypass);
>>> break;
>>> default:
>>> BUG(); /* STE corruption */
>>> }
>>> }
>>>
>>> + ste_live = s1_live || s2_live;
>>> +
>>> /* Nuke the existing STE_0 value, as we're going to rewrite it */
>>> val = STRTAB_STE_0_V;
>>>
>>> /* Bypass/fault */
>>> - if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
>>> - if (!smmu_domain && disable_bypass)
>>> +
>>> + if (!smmu_domain)
>>> + abort = disable_bypass;
>>> + else
>>> + abort = smmu_domain->abort;
>>> +
>>> + if (abort || !translate) {
>>> + if (abort)
>>> val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
>>> else
>>> val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
>>> @@ -1274,8 +1292,16 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>> return;
>>> }
>>>
>>> + BUG_ON(ste_live && !nested);
>>> +
>>> + if (ste_live) {
>>> + /* First invalidate the live STE */
>>> + dst[0] = cpu_to_le64(STRTAB_STE_0_CFG_ABORT);
>>> + arm_smmu_sync_ste_for_sid(smmu, sid);
>>> + }
>>> +
>>> if (s1_cfg->set) {
>>> - BUG_ON(ste_live);
>>> + BUG_ON(s1_live);
>>> dst[1] = cpu_to_le64(
>>> FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
>>> FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
>>> @@ -1294,7 +1320,14 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>> }
>>>
>>> if (s2_cfg->set) {
>>> - BUG_ON(ste_live);
>>> + u64 vttbr = s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK;
>>> +
>>> + if (s2_live) {
>>> + u64 s2ttb = le64_to_cpu(dst[3] & STRTAB_STE_3_S2TTB_MASK);
>>> +
>>> + BUG_ON(s2ttb != vttbr);
>>> + }
>>> +
>>> dst[2] = cpu_to_le64(
>>> FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
>>> FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
>>> @@ -1304,9 +1337,12 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>> STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 |
>>> STRTAB_STE_2_S2R);
>>>
>>> - dst[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
>>> + dst[3] = cpu_to_le64(vttbr);
>>>
>>> val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
>>> + } else {
>>> + dst[2] = 0;
>>> + dst[3] = 0;
>>> }
>>>
>>> if (master->ats_enabled)
>>> @@ -1982,6 +2018,14 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
>>> return 0;
>>> }
>>>
>>> + if (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED &&
>>> + (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1) ||
>>> + !(smmu->features & ARM_SMMU_FEAT_TRANS_S2))) {
>>> + dev_info(smmu_domain->smmu->dev,
>>> + "does not implement two stages\n");
>>> + return -EINVAL;
>>> + }
>>> +
>>> /* Restrict the stage to what we can actually support */
>>> if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
>>> smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>> index 07f59252dd21..269779dee8d1 100644
>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>> @@ -206,6 +206,7 @@
>>> #define STRTAB_STE_0_CFG_BYPASS 4
>>> #define STRTAB_STE_0_CFG_S1_TRANS 5
>>> #define STRTAB_STE_0_CFG_S2_TRANS 6
>>> +#define STRTAB_STE_0_CFG_NESTED 7
>>>
>>> #define STRTAB_STE_0_S1FMT GENMASK_ULL(5, 4)
>>> #define STRTAB_STE_0_S1FMT_LINEAR 0
>>> @@ -682,6 +683,7 @@ struct arm_smmu_domain {
>>> enum arm_smmu_domain_stage stage;
>>> struct arm_smmu_s1_cfg s1_cfg;
>>> struct arm_smmu_s2_cfg s2_cfg;
>>> + bool abort;
>>>
>>> struct iommu_domain domain;
>> Thanks,
>>
>> Kunkun Jiang
>>
> .

Thanks

Kunkun Jiang

Subject: RE: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support

Hi Eric,

> -----Original Message-----
> From: Eric Auger [mailto:[email protected]]
> Sent: 18 November 2020 11:22
> To: [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; Shameerali Kolothum
> Thodi <[email protected]>;
> [email protected]; [email protected]; [email protected];
> [email protected]; yuzenghui <[email protected]>
> Subject: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage
> support
>
> When nested stage translation is setup, both s1_cfg and
> s2_cfg are set.
>
> We introduce a new smmu domain abort field that will be set
> upon guest stage1 configuration passing.
>
> arm_smmu_write_strtab_ent() is modified to write both stage
> fields in the STE and deal with the abort field.
>
> In nested mode, only stage 2 is "finalized" as the host does
> not own/configure the stage 1 context descriptor; guest does.
>
> Signed-off-by: Eric Auger <[email protected]>
>
> ---
> v10 -> v11:
> - Fix an issue reported by Shameer when switching from with vSMMU
> to without vSMMU. Despite the spec does not seem to mention it
> seems to be needed to reset the 2 high 64b when switching from
> S1+S2 cfg to S1 only. Especially dst[3] needs to be reset (S2TTB).
> On some implementations, if the S2TTB is not reset, this causes
> a C_BAD_STE error
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 64
> +++++++++++++++++----
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +
> 2 files changed, 56 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 18ac5af1b284..412ea1bafa50 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1181,8 +1181,10 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
> * three cases at the moment:
> *
> * 1. Invalid (all zero) -> bypass/fault (init)
> - * 2. Bypass/fault -> translation/bypass (attach)
> - * 3. Translation/bypass -> bypass/fault (detach)
> + * 2. Bypass/fault -> single stage translation/bypass (attach)
> + * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
> + * 4. S2 -> S1 + S2 (attach_pasid_table)
> + * 5. S1 + S2 -> S2 (detach_pasid_table)
> *
> * Given that we can't update the STE atomically and the SMMU
> * doesn't read the thing in a defined order, that leaves us
> @@ -1193,7 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
> * 3. Update Config, sync
> */
> u64 val = le64_to_cpu(dst[0]);
> - bool ste_live = false;
> + bool s1_live = false, s2_live = false, ste_live;
> + bool abort, nested = false, translate = false;
> struct arm_smmu_device *smmu = NULL;
> struct arm_smmu_s1_cfg *s1_cfg;
> struct arm_smmu_s2_cfg *s2_cfg;
> @@ -1233,6 +1236,8 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
> default:
> break;
> }
> + nested = s1_cfg->set && s2_cfg->set;

This is a problem when the Guest is booted with iommu.passthrough = 1 as we
set s1_cfg.set = false for IOMMU_PASID_CONFIG_BYPASS.

Results in BUG_ON(ste_live && !nested).

Can we instead have nested = true set a bit above in the code, where we set
s2_cfg->set = true for the ARM_SMMU_DOMAIN_NESTED case?

Please take a look.

Thanks,
Shameer

> + translate = s1_cfg->set || s2_cfg->set;
> }
>
> if (val & STRTAB_STE_0_V) {
> @@ -1240,23 +1245,36 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
> case STRTAB_STE_0_CFG_BYPASS:
> break;
> case STRTAB_STE_0_CFG_S1_TRANS:
> + s1_live = true;
> + break;
> case STRTAB_STE_0_CFG_S2_TRANS:
> - ste_live = true;
> + s2_live = true;
> + break;
> + case STRTAB_STE_0_CFG_NESTED:
> + s1_live = true;
> + s2_live = true;
> break;
> case STRTAB_STE_0_CFG_ABORT:
> - BUG_ON(!disable_bypass);
> break;
> default:
> BUG(); /* STE corruption */
> }
> }
>
> + ste_live = s1_live || s2_live;
> +
> /* Nuke the existing STE_0 value, as we're going to rewrite it */
> val = STRTAB_STE_0_V;
>
> /* Bypass/fault */
> - if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
> - if (!smmu_domain && disable_bypass)
> +
> + if (!smmu_domain)
> + abort = disable_bypass;
> + else
> + abort = smmu_domain->abort;
> +
> + if (abort || !translate) {
> + if (abort)
> val |= FIELD_PREP(STRTAB_STE_0_CFG,
> STRTAB_STE_0_CFG_ABORT);
> else
> val |= FIELD_PREP(STRTAB_STE_0_CFG,
> STRTAB_STE_0_CFG_BYPASS);
> @@ -1274,8 +1292,16 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
> return;
> }
>
> + BUG_ON(ste_live && !nested);
> +
> + if (ste_live) {
> + /* First invalidate the live STE */
> + dst[0] = cpu_to_le64(STRTAB_STE_0_CFG_ABORT);
> + arm_smmu_sync_ste_for_sid(smmu, sid);
> + }
> +
> if (s1_cfg->set) {
> - BUG_ON(ste_live);
> + BUG_ON(s1_live);
> dst[1] = cpu_to_le64(
> FIELD_PREP(STRTAB_STE_1_S1DSS,
> STRTAB_STE_1_S1DSS_SSID0) |
> FIELD_PREP(STRTAB_STE_1_S1CIR,
> STRTAB_STE_1_S1C_CACHE_WBRA) |
> @@ -1294,7 +1320,14 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
> }
>
> if (s2_cfg->set) {
> - BUG_ON(ste_live);
> + u64 vttbr = s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK;
> +
> + if (s2_live) {
> + u64 s2ttb = le64_to_cpu(dst[3] & STRTAB_STE_3_S2TTB_MASK);
> +
> + BUG_ON(s2ttb != vttbr);
> + }
> +
> dst[2] = cpu_to_le64(
> FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
> FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
> @@ -1304,9 +1337,12 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
> STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 |
> STRTAB_STE_2_S2R);
>
> - dst[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
> + dst[3] = cpu_to_le64(vttbr);
>
> val |= FIELD_PREP(STRTAB_STE_0_CFG,
> STRTAB_STE_0_CFG_S2_TRANS);
> + } else {
> + dst[2] = 0;
> + dst[3] = 0;
> }
>
> if (master->ats_enabled)
> @@ -1982,6 +2018,14 @@ static int arm_smmu_domain_finalise(struct
> iommu_domain *domain,
> return 0;
> }
>
> + if (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED &&
> + (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1) ||
> + !(smmu->features & ARM_SMMU_FEAT_TRANS_S2))) {
> + dev_info(smmu_domain->smmu->dev,
> + "does not implement two stages\n");
> + return -EINVAL;
> + }
> +
> /* Restrict the stage to what we can actually support */
> if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
> smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 07f59252dd21..269779dee8d1 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -206,6 +206,7 @@
> #define STRTAB_STE_0_CFG_BYPASS 4
> #define STRTAB_STE_0_CFG_S1_TRANS 5
> #define STRTAB_STE_0_CFG_S2_TRANS 6
> +#define STRTAB_STE_0_CFG_NESTED 7
>
> #define STRTAB_STE_0_S1FMT GENMASK_ULL(5, 4)
> #define STRTAB_STE_0_S1FMT_LINEAR 0
> @@ -682,6 +683,7 @@ struct arm_smmu_domain {
> enum arm_smmu_domain_stage stage;
> struct arm_smmu_s1_cfg s1_cfg;
> struct arm_smmu_s2_cfg s2_cfg;
> + bool abort;
>
> struct iommu_domain domain;
>
> --
> 2.21.3

Subject: RE: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

Hi Eric,

> -----Original Message-----
> From: Eric Auger [mailto:[email protected]]
> Sent: 18 November 2020 11:22
> To: [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; Shameerali Kolothum
> Thodi <[email protected]>;
> [email protected]; [email protected]; [email protected];
> [email protected]; yuzenghui <[email protected]>
> Subject: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
>
> This series brings the IOMMU part of HW nested paging support
> in the SMMUv3. The VFIO part is submitted separately.
>
> The IOMMU API is extended to support 2 new API functionalities:
> 1) pass the guest stage 1 configuration
> 2) pass stage 1 MSI bindings
>
> Then those capabilities gets implemented in the SMMUv3 driver.
>
> The virtualizer passes information through the VFIO user API
> which cascades them to the iommu subsystem. This allows the guest
> to own stage 1 tables and context descriptors (so-called PASID
> table) while the host owns stage 2 tables and main configuration
> structures (STE).

I am seeing an issue with Guest testpmd run with this series.
I have two different setups and testpmd works fine with the
first one but not with the second.

1). Guest doesn't have kernel driver built-in for pass-through dev.

root@ubuntu:/# lspci -v
...
00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 21)
Subsystem: Huawei Technologies Co., Ltd. Device 0000
Flags: fast devsel
Memory at 8000100000 (64-bit, prefetchable) [disabled] [size=64K]
Memory at 8000000000 (64-bit, prefetchable) [disabled] [size=1M]
Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [a0] MSI-X: Enable- Count=67 Masked-
Capabilities: [b0] Power Management version 3
Capabilities: [100] Access Control Services
Capabilities: [300] Transaction Processing Hints

root@ubuntu:/# echo vfio-pci > /sys/bus/pci/devices/0000:00:02.0/driver_override
root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe

root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix socket0 -l 0-1 -n 2 -- -i
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-32768kB
EAL: No available hugepages reported in hugepages-64kB
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: Invalid NUMA socket, default to 0
EAL: using IOMMU type 1 (Type 1)
EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket 0)
EAL: No legacy callbacks, legacy socket not created
Interactive-mode selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc

Warning! port-topology=paired and odd forward ports number, the last port will pair with itself.

Configuring Port 0 (socket 0)
Port 0: 8E:A6:8C:43:43:45
Checking link statuses...
Done
testpmd>

2). Guest have kernel driver built-in for pass-through dev.

root@ubuntu:/# lspci -v
...
00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 21)
Subsystem: Huawei Technologies Co., Ltd. Device 0000
Flags: bus master, fast devsel, latency 0
Memory at 8000100000 (64-bit, prefetchable) [size=64K]
Memory at 8000000000 (64-bit, prefetchable) [size=1M]
Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [a0] MSI-X: Enable+ Count=67 Masked-
Capabilities: [b0] Power Management version 3
Capabilities: [100] Access Control Services
Capabilities: [300] Transaction Processing Hints
Kernel driver in use: hns3

root@ubuntu:/# echo vfio-pci > /sys/bus/pci/devices/0000:00:02.0/driver_override
root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers/hns3/unbind
root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe

root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix socket0 -l 0-1 -n 2 -- -i
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-32768kB
EAL: No available hugepages reported in hugepages-64kB
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: Invalid NUMA socket, default to 0
EAL: using IOMMU type 1 (Type 1)
EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket 0)
0000:00:02.0 hns3_get_mbx_resp(): VF could not get mbx(11,0) head(1) tail(0) lost(1) from PF in_irq:0
hns3vf_get_queue_info(): Failed to get tqp info from PF: -62
hns3vf_init_vf(): Failed to fetch configuration: -62
hns3vf_dev_init(): Failed to init vf: -62
EAL: Releasing pci mapped resource for 0000:00:02.0
EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100800000
EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100810000
EAL: Requested device 0000:00:02.0 cannot be used
EAL: Bus (pci) probe failed.
EAL: No legacy callbacks, legacy socket not created
testpmd: No probed ethernet devices
Interactive-mode selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Done
testpmd>

And in this case, smmu(host) reports a translation fault,

[ 6542.670624] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
[ 6542.670630] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00007d1200000010
[ 6542.670631] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000012000000007c
[ 6542.670633] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef040
[ 6542.670634] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef000

Tested with Intel 82599 card(ixgbevf) as well. but same errror.

Not able to root cause the problem yet. With the hope that, this is
related to tlb entries not being invlaidated properly, I tried explicitly
issuing CMD_TLBI_NSNH_ALL and CMD_CFGI_CD_ALL just before
the STE update, but no luck yet :(

Please let me know if I am missing something here or has any clue if you
can replicate this on your setup.

Thanks,
Shameer

>
> Best Regards
>
> Eric
>
> This series can be found at:
> https://github.com/eauger/linux/tree/5.10-rc4-2stage-v13
> (including the VFIO part in his last version: v11)
>
> The series includes a patch from Jean-Philippe. It is better to
> review the original patch:
> [PATCH v8 2/9] iommu/arm-smmu-v3: Maintain a SID->device structure
>
> The VFIO series is sent separately.
>
> History:
>
> v12 -> v13:
> - fixed compilation issue with CONFIG_ARM_SMMU_V3_SVA
> reported by Shameer. This urged me to revisit patch 4 into
> iommu/smmuv3: Allow s1 and s2 configs to coexist where
> s1_cfg and s2_cfg are not dynamically allocated anymore.
> Instead I use a new set field in existing structs
> - fixed 2 others config checks
> - Updated "iommu/arm-smmu-v3: Maintain a SID->device structure"
> according to the last version
>
> v11 -> v12:
> - rebase on top of v5.10-rc4
>
> Eric Auger (14):
> iommu: Introduce attach/detach_pasid_table API
> iommu: Introduce bind/unbind_guest_msi
> iommu/smmuv3: Allow s1 and s2 configs to coexist
> iommu/smmuv3: Get prepared for nested stage support
> iommu/smmuv3: Implement attach/detach_pasid_table
> iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
> iommu/smmuv3: Implement cache_invalidate
> dma-iommu: Implement NESTED_MSI cookie
> iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
> iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI
> regions
> iommu/smmuv3: Implement bind/unbind_guest_msi
> iommu/smmuv3: Report non recoverable faults
> iommu/smmuv3: Accept configs with more than one context descriptor
> iommu/smmuv3: Add PASID cache invalidation per PASID
>
> Jean-Philippe Brucker (1):
> iommu/arm-smmu-v3: Maintain a SID->device structure
>
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 659
> ++++++++++++++++++--
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 103 ++-
> drivers/iommu/dma-iommu.c | 142 ++++-
> drivers/iommu/iommu.c | 105 ++++
> include/linux/dma-iommu.h | 16 +
> include/linux/iommu.h | 41 ++
> include/uapi/linux/iommu.h | 54 ++
> 7 files changed, 1042 insertions(+), 78 deletions(-)
>
> --
> 2.21.3

2021-01-13 15:42:18

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

Hi Shameer,

On 1/8/21 6:05 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: Eric Auger [mailto:[email protected]]
>> Sent: 18 November 2020 11:22
>> To: [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]
>> Cc: [email protected]; [email protected];
>> [email protected]; [email protected]; Shameerali Kolothum
>> Thodi <[email protected]>;
>> [email protected]; [email protected]; [email protected];
>> [email protected]; yuzenghui <[email protected]>
>> Subject: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
>>
>> This series brings the IOMMU part of HW nested paging support
>> in the SMMUv3. The VFIO part is submitted separately.
>>
>> The IOMMU API is extended to support 2 new API functionalities:
>> 1) pass the guest stage 1 configuration
>> 2) pass stage 1 MSI bindings
>>
>> Then those capabilities gets implemented in the SMMUv3 driver.
>>
>> The virtualizer passes information through the VFIO user API
>> which cascades them to the iommu subsystem. This allows the guest
>> to own stage 1 tables and context descriptors (so-called PASID
>> table) while the host owns stage 2 tables and main configuration
>> structures (STE).
>
> I am seeing an issue with Guest testpmd run with this series.
> I have two different setups and testpmd works fine with the
> first one but not with the second.
>
> 1). Guest doesn't have kernel driver built-in for pass-through dev.
>
> root@ubuntu:/# lspci -v
> ...
> 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 21)
> Subsystem: Huawei Technologies Co., Ltd. Device 0000
> Flags: fast devsel
> Memory at 8000100000 (64-bit, prefetchable) [disabled] [size=64K]
> Memory at 8000000000 (64-bit, prefetchable) [disabled] [size=1M]
> Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> Capabilities: [a0] MSI-X: Enable- Count=67 Masked-
> Capabilities: [b0] Power Management version 3
> Capabilities: [100] Access Control Services
> Capabilities: [300] Transaction Processing Hints
>
> root@ubuntu:/# echo vfio-pci > /sys/bus/pci/devices/0000:00:02.0/driver_override
> root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe
>
> root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix socket0 -l 0-1 -n 2 -- -i
> EAL: Detected 8 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-32768kB
> EAL: No available hugepages reported in hugepages-64kB
> EAL: No available hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL: Invalid NUMA socket, default to 0
> EAL: using IOMMU type 1 (Type 1)
> EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket 0)
> EAL: No legacy callbacks, legacy socket not created
> Interactive-mode selected
> testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
>
> Warning! port-topology=paired and odd forward ports number, the last port will pair with itself.
>
> Configuring Port 0 (socket 0)
> Port 0: 8E:A6:8C:43:43:45
> Checking link statuses...
> Done
> testpmd>
>
> 2). Guest have kernel driver built-in for pass-through dev.
>
> root@ubuntu:/# lspci -v
> ...
> 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 21)
> Subsystem: Huawei Technologies Co., Ltd. Device 0000
> Flags: bus master, fast devsel, latency 0
> Memory at 8000100000 (64-bit, prefetchable) [size=64K]
> Memory at 8000000000 (64-bit, prefetchable) [size=1M]
> Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> Capabilities: [a0] MSI-X: Enable+ Count=67 Masked-
> Capabilities: [b0] Power Management version 3
> Capabilities: [100] Access Control Services
> Capabilities: [300] Transaction Processing Hints
> Kernel driver in use: hns3
>
> root@ubuntu:/# echo vfio-pci > /sys/bus/pci/devices/0000:00:02.0/driver_override
> root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers/hns3/unbind
> root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe
>
> root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix socket0 -l 0-1 -n 2 -- -i
> EAL: Detected 8 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-32768kB
> EAL: No available hugepages reported in hugepages-64kB
> EAL: No available hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL: Invalid NUMA socket, default to 0
> EAL: using IOMMU type 1 (Type 1)
> EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket 0)
> 0000:00:02.0 hns3_get_mbx_resp(): VF could not get mbx(11,0) head(1) tail(0) lost(1) from PF in_irq:0
> hns3vf_get_queue_info(): Failed to get tqp info from PF: -62
> hns3vf_init_vf(): Failed to fetch configuration: -62
> hns3vf_dev_init(): Failed to init vf: -62
> EAL: Releasing pci mapped resource for 0000:00:02.0
> EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100800000
> EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100810000
> EAL: Requested device 0000:00:02.0 cannot be used
> EAL: Bus (pci) probe failed.
> EAL: No legacy callbacks, legacy socket not created
> testpmd: No probed ethernet devices
> Interactive-mode selected
> testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> Done
> testpmd>
>
> And in this case, smmu(host) reports a translation fault,
>
> [ 6542.670624] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
> [ 6542.670630] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00007d1200000010
> [ 6542.670631] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000012000000007c
> [ 6542.670633] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef040
> [ 6542.670634] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef000
>
> Tested with Intel 82599 card(ixgbevf) as well. but same errror.
>
> Not able to root cause the problem yet. With the hope that, this is
> related to tlb entries not being invlaidated properly, I tried explicitly
> issuing CMD_TLBI_NSNH_ALL and CMD_CFGI_CD_ALL just before
> the STE update, but no luck yet :(
>
> Please let me know if I am missing something here or has any clue if you
> can replicate this on your setup.

Thank for for the report. I need to setup the DPDK environment again and
I will test on my end. I plan to respin next week and study all the
issues you reported up to now. I will rebase on Jean's last branch.

Thanks

Eric
>
> Thanks,
> Shameer
>
>>
>> Best Regards
>>
>> Eric
>>
>> This series can be found at:
>> https://github.com/eauger/linux/tree/5.10-rc4-2stage-v13
>> (including the VFIO part in his last version: v11)
>>
>> The series includes a patch from Jean-Philippe. It is better to
>> review the original patch:
>> [PATCH v8 2/9] iommu/arm-smmu-v3: Maintain a SID->device structure
>>
>> The VFIO series is sent separately.
>>
>> History:
>>
>> v12 -> v13:
>> - fixed compilation issue with CONFIG_ARM_SMMU_V3_SVA
>> reported by Shameer. This urged me to revisit patch 4 into
>> iommu/smmuv3: Allow s1 and s2 configs to coexist where
>> s1_cfg and s2_cfg are not dynamically allocated anymore.
>> Instead I use a new set field in existing structs
>> - fixed 2 others config checks
>> - Updated "iommu/arm-smmu-v3: Maintain a SID->device structure"
>> according to the last version
>>
>> v11 -> v12:
>> - rebase on top of v5.10-rc4
>>
>> Eric Auger (14):
>> iommu: Introduce attach/detach_pasid_table API
>> iommu: Introduce bind/unbind_guest_msi
>> iommu/smmuv3: Allow s1 and s2 configs to coexist
>> iommu/smmuv3: Get prepared for nested stage support
>> iommu/smmuv3: Implement attach/detach_pasid_table
>> iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
>> iommu/smmuv3: Implement cache_invalidate
>> dma-iommu: Implement NESTED_MSI cookie
>> iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
>> iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI
>> regions
>> iommu/smmuv3: Implement bind/unbind_guest_msi
>> iommu/smmuv3: Report non recoverable faults
>> iommu/smmuv3: Accept configs with more than one context descriptor
>> iommu/smmuv3: Add PASID cache invalidation per PASID
>>
>> Jean-Philippe Brucker (1):
>> iommu/arm-smmu-v3: Maintain a SID->device structure
>>
>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 659
>> ++++++++++++++++++--
>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 103 ++-
>> drivers/iommu/dma-iommu.c | 142 ++++-
>> drivers/iommu/iommu.c | 105 ++++
>> include/linux/dma-iommu.h | 16 +
>> include/linux/iommu.h | 41 ++
>> include/uapi/linux/iommu.h | 54 ++
>> 7 files changed, 1042 insertions(+), 78 deletions(-)
>>
>> --
>> 2.21.3
>

2021-02-01 12:29:53

by Keqian Zhu

[permalink] [raw]
Subject: Re: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure

Hi Eric,

On 2020/11/18 19:21, Eric Auger wrote:
> From: Jean-Philippe Brucker <[email protected]>
>
> When handling faults from the event or PRI queue, we need to find the
> struct device associated to a SID. Add a rb_tree to keep track of SIDs.
>
> Signed-off-by: Jean-Philippe Brucker <[email protected]>
[...]

> }
>
> +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
> + struct arm_smmu_master *master)
> +{
> + int i;
> + int ret = 0;
> + struct arm_smmu_stream *new_stream, *cur_stream;
> + struct rb_node **new_node, *parent_node = NULL;
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
> +
> + master->streams = kcalloc(fwspec->num_ids,
> + sizeof(struct arm_smmu_stream), GFP_KERNEL);
> + if (!master->streams)
> + return -ENOMEM;
> + master->num_streams = fwspec->num_ids;
This is not roll-backed when fail.

> +
> + mutex_lock(&smmu->streams_mutex);
> + for (i = 0; i < fwspec->num_ids && !ret; i++) {
Check ret at here, makes it hard to decide the start index of rollback.

If we fail at here, then start index is (i-2).
If we fail in the loop, then start index is (i-1).

> + u32 sid = fwspec->ids[i];
> +
> + new_stream = &master->streams[i];
> + new_stream->id = sid;
> + new_stream->master = master;
> +
> + /*
> + * Check the SIDs are in range of the SMMU and our stream table
> + */
> + if (!arm_smmu_sid_in_range(smmu, sid)) {
> + ret = -ERANGE;
> + break;
> + }
> +
> + /* Ensure l2 strtab is initialised */
> + if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
> + ret = arm_smmu_init_l2_strtab(smmu, sid);
> + if (ret)
> + break;
> + }
> +
> + /* Insert into SID tree */
> + new_node = &(smmu->streams.rb_node);
> + while (*new_node) {
> + cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
> + node);
> + parent_node = *new_node;
> + if (cur_stream->id > new_stream->id) {
> + new_node = &((*new_node)->rb_left);
> + } else if (cur_stream->id < new_stream->id) {
> + new_node = &((*new_node)->rb_right);
> + } else {
> + dev_warn(master->dev,
> + "stream %u already in tree\n",
> + cur_stream->id);
> + ret = -EINVAL;
> + break;
> + }
> + }
> +
> + if (!ret) {
> + rb_link_node(&new_stream->node, parent_node, new_node);
> + rb_insert_color(&new_stream->node, &smmu->streams);
> + }
> + }
> +
> + if (ret) {
> + for (; i > 0; i--)
should be (i >= 0)?
And the start index seems not correct.

> + rb_erase(&master->streams[i].node, &smmu->streams);
> + kfree(master->streams);
> + }
> + mutex_unlock(&smmu->streams_mutex);
> +
> + return ret;
> +}
> +
> +static void arm_smmu_remove_master(struct arm_smmu_master *master)
> +{
> + int i;
> + struct arm_smmu_device *smmu = master->smmu;
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
> +
> + if (!smmu || !master->streams)
> + return;
> +
> + mutex_lock(&smmu->streams_mutex);
> + for (i = 0; i < fwspec->num_ids; i++)
> + rb_erase(&master->streams[i].node, &smmu->streams);
> + mutex_unlock(&smmu->streams_mutex);
> +
> + kfree(master->streams);
> +}
> +
> static struct iommu_ops arm_smmu_ops;
>
> static struct iommu_device *arm_smmu_probe_device(struct device *dev)
> {
> - int i, ret;
> + int ret;
> struct arm_smmu_device *smmu;
> struct arm_smmu_master *master;
> struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> @@ -2331,27 +2447,12 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>
> master->dev = dev;
> master->smmu = smmu;
> - master->sids = fwspec->ids;
> - master->num_sids = fwspec->num_ids;
> INIT_LIST_HEAD(&master->bonds);
> dev_iommu_priv_set(dev, master);
>
> - /* Check the SIDs are in range of the SMMU and our stream table */
> - for (i = 0; i < master->num_sids; i++) {
> - u32 sid = master->sids[i];
> -
> - if (!arm_smmu_sid_in_range(smmu, sid)) {
> - ret = -ERANGE;
> - goto err_free_master;
> - }
> -
> - /* Ensure l2 strtab is initialised */
> - if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
> - ret = arm_smmu_init_l2_strtab(smmu, sid);
> - if (ret)
> - goto err_free_master;
> - }
> - }
> + ret = arm_smmu_insert_master(smmu, master);
> + if (ret)
> + goto err_free_master;
>
> master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
>
> @@ -2389,6 +2490,7 @@ static void arm_smmu_release_device(struct device *dev)
> WARN_ON(arm_smmu_master_sva_enabled(master));
> arm_smmu_detach_dev(master);
> arm_smmu_disable_pasid(master);
> + arm_smmu_remove_master(master);
> kfree(master);

Thanks,
Keqian

2021-02-01 12:37:20

by Keqian Zhu

[permalink] [raw]
Subject: Re: [PATCH v13 04/15] iommu/smmuv3: Allow s1 and s2 configs to coexist

Hi Eric,

On 2020/11/18 19:21, Eric Auger wrote:
> In true nested mode, both s1_cfg and s2_cfg will coexist.
> Let's remove the union and add a "set" field in each
> config structure telling whether the config is set and needs
> to be applied when writing the STE. In legacy nested mode,
> only the 2d stage is used. In true nested mode, the "set" field
nit: s/2d/2nd

> will be set when the guest passes the pasid table.
nit: ... the "set" filed of s1_cfg and s2_cfg will be set ...

>
> Signed-off-by: Eric Auger <[email protected]>
>
> ---
> v12 -> v13:
> - does not dynamically allocate s1-cfg and s2_cfg anymore. Add
> the set field
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 43 +++++++++++++--------
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 8 ++--
> 2 files changed, 31 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 1e4acc7f3d3c..18ac5af1b284 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1195,8 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
> u64 val = le64_to_cpu(dst[0]);
> bool ste_live = false;
> struct arm_smmu_device *smmu = NULL;
> - struct arm_smmu_s1_cfg *s1_cfg = NULL;
> - struct arm_smmu_s2_cfg *s2_cfg = NULL;
> + struct arm_smmu_s1_cfg *s1_cfg;
> + struct arm_smmu_s2_cfg *s2_cfg;
> struct arm_smmu_domain *smmu_domain = NULL;
> struct arm_smmu_cmdq_ent prefetch_cmd = {
> .opcode = CMDQ_OP_PREFETCH_CFG,
> @@ -1211,13 +1211,24 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
> }
>
> if (smmu_domain) {
> + s1_cfg = &smmu_domain->s1_cfg;
> + s2_cfg = &smmu_domain->s2_cfg;
> +
> switch (smmu_domain->stage) {
> case ARM_SMMU_DOMAIN_S1:
> - s1_cfg = &smmu_domain->s1_cfg;
> + s1_cfg->set = true;
> + s2_cfg->set = false;
> break;
> case ARM_SMMU_DOMAIN_S2:
> + s1_cfg->set = false;
> + s2_cfg->set = true;
> + break;
> case ARM_SMMU_DOMAIN_NESTED:
> - s2_cfg = &smmu_domain->s2_cfg;
> + /*
> + * Actual usage of stage 1 depends on nested mode:
> + * legacy (2d stage only) or true nested mode
> + */
> + s2_cfg->set = true;
> break;
> default:
> break;
> @@ -1244,7 +1255,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
> val = STRTAB_STE_0_V;
>
> /* Bypass/fault */
> - if (!smmu_domain || !(s1_cfg || s2_cfg)) {
> + if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
> if (!smmu_domain && disable_bypass)
> val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
> else
> @@ -1263,7 +1274,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
> return;
> }
>
> - if (s1_cfg) {
> + if (s1_cfg->set) {
> BUG_ON(ste_live);
> dst[1] = cpu_to_le64(
> FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
> @@ -1282,7 +1293,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
> FIELD_PREP(STRTAB_STE_0_S1FMT, s1_cfg->s1fmt);
> }
>
> - if (s2_cfg) {
> + if (s2_cfg->set) {
> BUG_ON(ste_live);
> dst[2] = cpu_to_le64(
> FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
> @@ -1846,24 +1857,24 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
> {
> struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> struct arm_smmu_device *smmu = smmu_domain->smmu;
> + struct arm_smmu_s1_cfg *s1_cfg = &smmu_domain->s1_cfg;
> + struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
>
> iommu_put_dma_cookie(domain);
> free_io_pgtable_ops(smmu_domain->pgtbl_ops);
>
> /* Free the CD and ASID, if we allocated them */
> - if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> - struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> -
> + if (s1_cfg->set) {
> /* Prevent SVA from touching the CD while we're freeing it */
> mutex_lock(&arm_smmu_asid_lock);
> - if (cfg->cdcfg.cdtab)
> + if (s1_cfg->cdcfg.cdtab)
> arm_smmu_free_cd_tables(smmu_domain);
> - arm_smmu_free_asid(&cfg->cd);
> + arm_smmu_free_asid(&s1_cfg->cd);
> mutex_unlock(&arm_smmu_asid_lock);
> - } else {
> - struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
> - if (cfg->vmid)
> - arm_smmu_bitmap_free(smmu->vmid_map, cfg->vmid);
> + }
> + if (s2_cfg->set) {
> + if (s2_cfg->vmid)
> + arm_smmu_bitmap_free(smmu->vmid_map, s2_cfg->vmid);
> }
>
> kfree(smmu_domain);
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 19196eea7c1d..07f59252dd21 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -562,12 +562,14 @@ struct arm_smmu_s1_cfg {
> struct arm_smmu_ctx_desc cd;
> u8 s1fmt;
> u8 s1cdmax;
> + bool set;
> };
>
> struct arm_smmu_s2_cfg {
> u16 vmid;
> u64 vttbr;
> u64 vtcr;
> + bool set;
> };
>
> struct arm_smmu_strtab_cfg {
> @@ -678,10 +680,8 @@ struct arm_smmu_domain {
> atomic_t nr_ats_masters;
>
> enum arm_smmu_domain_stage stage;
> - union {
> - struct arm_smmu_s1_cfg s1_cfg;
> - struct arm_smmu_s2_cfg s2_cfg;
> - };
> + struct arm_smmu_s1_cfg s1_cfg;
> + struct arm_smmu_s2_cfg s2_cfg;
>
> struct iommu_domain domain;
>
Other looks good to me. ;-)
>

Thanks,
Keqian

2021-02-01 15:19:56

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure

On Mon, Feb 01, 2021 at 08:26:41PM +0800, Keqian Zhu wrote:
> > +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
> > + struct arm_smmu_master *master)
> > +{
> > + int i;
> > + int ret = 0;
> > + struct arm_smmu_stream *new_stream, *cur_stream;
> > + struct rb_node **new_node, *parent_node = NULL;
> > + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
> > +
> > + master->streams = kcalloc(fwspec->num_ids,
> > + sizeof(struct arm_smmu_stream), GFP_KERNEL);
> > + if (!master->streams)
> > + return -ENOMEM;
> > + master->num_streams = fwspec->num_ids;
> This is not roll-backed when fail.

No need, the caller frees master

> > +
> > + mutex_lock(&smmu->streams_mutex);
> > + for (i = 0; i < fwspec->num_ids && !ret; i++) {
> Check ret at here, makes it hard to decide the start index of rollback.
>
> If we fail at here, then start index is (i-2).
> If we fail in the loop, then start index is (i-1).
>
[...]
> > + if (ret) {
> > + for (; i > 0; i--)
> should be (i >= 0)?
> And the start index seems not correct.

Indeed, this whole bit is wrong. I'll fix it while resending the IOPF
series.

Thanks,
Jean

2021-02-01 17:26:06

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure

Hi Keqian,

On 2/1/21 1:26 PM, Keqian Zhu wrote:
> Hi Eric,
>
> On 2020/11/18 19:21, Eric Auger wrote:
>> From: Jean-Philippe Brucker <[email protected]>
>>
>> When handling faults from the event or PRI queue, we need to find the
>> struct device associated to a SID. Add a rb_tree to keep track of SIDs.
>>
>> Signed-off-by: Jean-Philippe Brucker <[email protected]>
> [...]
>
>> }
>>
>> +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>> + struct arm_smmu_master *master)
>> +{
>> + int i;
>> + int ret = 0;
>> + struct arm_smmu_stream *new_stream, *cur_stream;
>> + struct rb_node **new_node, *parent_node = NULL;
>> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
>> +
>> + master->streams = kcalloc(fwspec->num_ids,
>> + sizeof(struct arm_smmu_stream), GFP_KERNEL);
>> + if (!master->streams)
>> + return -ENOMEM;
>> + master->num_streams = fwspec->num_ids;
> This is not roll-backed when fail.
>
>> +
>> + mutex_lock(&smmu->streams_mutex);
>> + for (i = 0; i < fwspec->num_ids && !ret; i++) {
> Check ret at here, makes it hard to decide the start index of rollback.
>
> If we fail at here, then start index is (i-2).
> If we fail in the loop, then start index is (i-1).
>
>> + u32 sid = fwspec->ids[i];
>> +
>> + new_stream = &master->streams[i];
>> + new_stream->id = sid;
>> + new_stream->master = master;
>> +
>> + /*
>> + * Check the SIDs are in range of the SMMU and our stream table
>> + */
>> + if (!arm_smmu_sid_in_range(smmu, sid)) {
>> + ret = -ERANGE;
>> + break;
>> + }
>> +
>> + /* Ensure l2 strtab is initialised */
>> + if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
>> + ret = arm_smmu_init_l2_strtab(smmu, sid);
>> + if (ret)
>> + break;
>> + }
>> +
>> + /* Insert into SID tree */
>> + new_node = &(smmu->streams.rb_node);
>> + while (*new_node) {
>> + cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
>> + node);
>> + parent_node = *new_node;
>> + if (cur_stream->id > new_stream->id) {
>> + new_node = &((*new_node)->rb_left);
>> + } else if (cur_stream->id < new_stream->id) {
>> + new_node = &((*new_node)->rb_right);
>> + } else {
>> + dev_warn(master->dev,
>> + "stream %u already in tree\n",
>> + cur_stream->id);
>> + ret = -EINVAL;
>> + break;
>> + }
>> + }
>> +
>> + if (!ret) {
>> + rb_link_node(&new_stream->node, parent_node, new_node);
>> + rb_insert_color(&new_stream->node, &smmu->streams);
>> + }
>> + }
>> +
>> + if (ret) {
>> + for (; i > 0; i--)
> should be (i >= 0)?
> And the start index seems not correct.
>
>> + rb_erase(&master->streams[i].node, &smmu->streams);
>> + kfree(master->streams);
>> + }
>> + mutex_unlock(&smmu->streams_mutex);
>> +
>> + return ret;
>> +}
>> +
>> +static void arm_smmu_remove_master(struct arm_smmu_master *master)
>> +{
>> + int i;
>> + struct arm_smmu_device *smmu = master->smmu;
>> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
>> +
>> + if (!smmu || !master->streams)
>> + return;
>> +
>> + mutex_lock(&smmu->streams_mutex);
>> + for (i = 0; i < fwspec->num_ids; i++)
>> + rb_erase(&master->streams[i].node, &smmu->streams);
>> + mutex_unlock(&smmu->streams_mutex);
>> +
>> + kfree(master->streams);
>> +}
>> +
>> static struct iommu_ops arm_smmu_ops;
>>
>> static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>> {
>> - int i, ret;
>> + int ret;
>> struct arm_smmu_device *smmu;
>> struct arm_smmu_master *master;
>> struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>> @@ -2331,27 +2447,12 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>>
>> master->dev = dev;
>> master->smmu = smmu;
>> - master->sids = fwspec->ids;
>> - master->num_sids = fwspec->num_ids;
>> INIT_LIST_HEAD(&master->bonds);
>> dev_iommu_priv_set(dev, master);
>>
>> - /* Check the SIDs are in range of the SMMU and our stream table */
>> - for (i = 0; i < master->num_sids; i++) {
>> - u32 sid = master->sids[i];
>> -
>> - if (!arm_smmu_sid_in_range(smmu, sid)) {
>> - ret = -ERANGE;
>> - goto err_free_master;
>> - }
>> -
>> - /* Ensure l2 strtab is initialised */
>> - if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
>> - ret = arm_smmu_init_l2_strtab(smmu, sid);
>> - if (ret)
>> - goto err_free_master;
>> - }
>> - }
>> + ret = arm_smmu_insert_master(smmu, master);
>> + if (ret)
>> + goto err_free_master;
>>
>> master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
>>
>> @@ -2389,6 +2490,7 @@ static void arm_smmu_release_device(struct device *dev)
>> WARN_ON(arm_smmu_master_sva_enabled(master));
>> arm_smmu_detach_dev(master);
>> arm_smmu_disable_pasid(master);
>> + arm_smmu_remove_master(master);
>> kfree(master);
>
> Thanks,
> Keqian
>
Thank you for the review. Jean will address this issues in his own
series and on my end I will rebase on this latter.

Best Regards

Eric


2021-02-02 06:44:25

by Keqian Zhu

[permalink] [raw]
Subject: Re: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure

Hi Jean,

On 2021/2/1 23:15, Jean-Philippe Brucker wrote:
> On Mon, Feb 01, 2021 at 08:26:41PM +0800, Keqian Zhu wrote:
>>> +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>>> + struct arm_smmu_master *master)
>>> +{
>>> + int i;
>>> + int ret = 0;
>>> + struct arm_smmu_stream *new_stream, *cur_stream;
>>> + struct rb_node **new_node, *parent_node = NULL;
>>> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
>>> +
>>> + master->streams = kcalloc(fwspec->num_ids,
>>> + sizeof(struct arm_smmu_stream), GFP_KERNEL);
>>> + if (!master->streams)
>>> + return -ENOMEM;
>>> + master->num_streams = fwspec->num_ids;
>> This is not roll-backed when fail.
>
> No need, the caller frees master
OK.

>
>>> +
>>> + mutex_lock(&smmu->streams_mutex);
>>> + for (i = 0; i < fwspec->num_ids && !ret; i++) {
>> Check ret at here, makes it hard to decide the start index of rollback.
>>
>> If we fail at here, then start index is (i-2).
>> If we fail in the loop, then start index is (i-1).
>>
> [...]
>>> + if (ret) {
>>> + for (; i > 0; i--)
>> should be (i >= 0)?
>> And the start index seems not correct.
>
> Indeed, this whole bit is wrong. I'll fix it while resending the IOPF
> series.
>
> Thanks,
> Jean
OK, I am glad it helps.

Thanks,
Keqian

2021-02-02 07:17:42

by Keqian Zhu

[permalink] [raw]
Subject: Re: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support

Hi Eric,

On 2020/11/18 19:21, Eric Auger wrote:
> When nested stage translation is setup, both s1_cfg and
> s2_cfg are set.
>
> We introduce a new smmu domain abort field that will be set
> upon guest stage1 configuration passing.
>
> arm_smmu_write_strtab_ent() is modified to write both stage
> fields in the STE and deal with the abort field.
>
> In nested mode, only stage 2 is "finalized" as the host does
> not own/configure the stage 1 context descriptor; guest does.
>
> Signed-off-by: Eric Auger <[email protected]>
>
> ---
> v10 -> v11:
> - Fix an issue reported by Shameer when switching from with vSMMU
> to without vSMMU. Despite the spec does not seem to mention it
> seems to be needed to reset the 2 high 64b when switching from
> S1+S2 cfg to S1 only. Especially dst[3] needs to be reset (S2TTB).
> On some implementations, if the S2TTB is not reset, this causes
> a C_BAD_STE error
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 64 +++++++++++++++++----
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +
> 2 files changed, 56 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 18ac5af1b284..412ea1bafa50 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1181,8 +1181,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
> * three cases at the moment:
> *
> * 1. Invalid (all zero) -> bypass/fault (init)
> - * 2. Bypass/fault -> translation/bypass (attach)
> - * 3. Translation/bypass -> bypass/fault (detach)
> + * 2. Bypass/fault -> single stage translation/bypass (attach)
> + * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
> + * 4. S2 -> S1 + S2 (attach_pasid_table)
> + * 5. S1 + S2 -> S2 (detach_pasid_table)

The following line "BUG_ON(ste_live && !nested);" forbids this transform.
And I have a look at the 6th patch, the transform seems S1 + S2 -> abort.
So after detach, the status is not the same as that before attach. Does it
match our expectation?

> *
> * Given that we can't update the STE atomically and the SMMU
> * doesn't read the thing in a defined order, that leaves us
> @@ -1193,7 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
> * 3. Update Config, sync
> */
> u64 val = le64_to_cpu(dst[0]);
> - bool ste_live = false;
> + bool s1_live = false, s2_live = false, ste_live;
> + bool abort, nested = false, translate = false;
> struct arm_smmu_device *smmu = NULL;
> struct arm_smmu_s1_cfg *s1_cfg;
> struct arm_smmu_s2_cfg *s2_cfg;
> @@ -1233,6 +1236,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
> default:
> break;
> }
> + nested = s1_cfg->set && s2_cfg->set;
> + translate = s1_cfg->set || s2_cfg->set;
> }
>
> if (val & STRTAB_STE_0_V) {
> @@ -1240,23 +1245,36 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
> case STRTAB_STE_0_CFG_BYPASS:
> break;
> case STRTAB_STE_0_CFG_S1_TRANS:
> + s1_live = true;
> + break;
> case STRTAB_STE_0_CFG_S2_TRANS:
> - ste_live = true;
> + s2_live = true;
> + break;
> + case STRTAB_STE_0_CFG_NESTED:
> + s1_live = true;
> + s2_live = true;
> break;
> case STRTAB_STE_0_CFG_ABORT:
> - BUG_ON(!disable_bypass);
> break;
> default:
> BUG(); /* STE corruption */
> }
> }
>
> + ste_live = s1_live || s2_live;
> +
> /* Nuke the existing STE_0 value, as we're going to rewrite it */
> val = STRTAB_STE_0_V;
>
> /* Bypass/fault */
> - if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
> - if (!smmu_domain && disable_bypass)
> +
> + if (!smmu_domain)
> + abort = disable_bypass;
> + else
> + abort = smmu_domain->abort;
> +
> + if (abort || !translate) {
> + if (abort)
> val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
> else
> val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
> @@ -1274,8 +1292,16 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
> return;
> }
>
> + BUG_ON(ste_live && !nested);
> +
> + if (ste_live) {
> + /* First invalidate the live STE */
> + dst[0] = cpu_to_le64(STRTAB_STE_0_CFG_ABORT);
> + arm_smmu_sync_ste_for_sid(smmu, sid);
> + }
> +
[...]

Thanks,
Keqian

2021-02-02 07:22:15

by Keqian Zhu

[permalink] [raw]
Subject: Re: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure

Hi Eric,

On 2021/2/2 1:19, Auger Eric wrote:
> Hi Keqian,
>
> On 2/1/21 1:26 PM, Keqian Zhu wrote:
>> Hi Eric,
>>
>> On 2020/11/18 19:21, Eric Auger wrote:
>>> From: Jean-Philippe Brucker <[email protected]>
>>>
>>> When handling faults from the event or PRI queue, we need to find the
>>> struct device associated to a SID. Add a rb_tree to keep track of SIDs.
>>>
>>> Signed-off-by: Jean-Philippe Brucker <[email protected]>
>> [...]
>>
>>> }
>>>
>>> +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>>> + struct arm_smmu_master *master)
[...]

>>> kfree(master);
>>
>> Thanks,
>> Keqian
>>
> Thank you for the review. Jean will address this issues in his own
> series and on my end I will rebase on this latter.
>
> Best Regards
>
> Eric
>

Yeah, and hope this series can be accepted earlier ;-)

Thanks,
Keqian

2021-02-11 18:04:51

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support

Hi Keqian,

On 2/2/21 8:14 AM, Keqian Zhu wrote:
> Hi Eric,
>
> On 2020/11/18 19:21, Eric Auger wrote:
>> When nested stage translation is setup, both s1_cfg and
>> s2_cfg are set.
>>
>> We introduce a new smmu domain abort field that will be set
>> upon guest stage1 configuration passing.
>>
>> arm_smmu_write_strtab_ent() is modified to write both stage
>> fields in the STE and deal with the abort field.
>>
>> In nested mode, only stage 2 is "finalized" as the host does
>> not own/configure the stage 1 context descriptor; guest does.
>>
>> Signed-off-by: Eric Auger <[email protected]>
>>
>> ---
>> v10 -> v11:
>> - Fix an issue reported by Shameer when switching from with vSMMU
>> to without vSMMU. Despite the spec does not seem to mention it
>> seems to be needed to reset the 2 high 64b when switching from
>> S1+S2 cfg to S1 only. Especially dst[3] needs to be reset (S2TTB).
>> On some implementations, if the S2TTB is not reset, this causes
>> a C_BAD_STE error
>> ---
>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 64 +++++++++++++++++----
>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +
>> 2 files changed, 56 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 18ac5af1b284..412ea1bafa50 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -1181,8 +1181,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>> * three cases at the moment:
>> *
>> * 1. Invalid (all zero) -> bypass/fault (init)
>> - * 2. Bypass/fault -> translation/bypass (attach)
>> - * 3. Translation/bypass -> bypass/fault (detach)
>> + * 2. Bypass/fault -> single stage translation/bypass (attach)
>> + * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
>> + * 4. S2 -> S1 + S2 (attach_pasid_table)
>> + * 5. S1 + S2 -> S2 (detach_pasid_table)
>
> The following line "BUG_ON(ste_live && !nested);" forbids this transform.

Yes as pointed out by Kunkun, there is always an abort in-between. I
will restore the original comment.

> And I have a look at the 6th patch, the transform seems S1 + S2 -> abort.
> So after detach, the status is not the same as that before attach. Does it
> match our expectation?

Indeed at detach time I think I should reset the abort() flag as this
latter is not imposed anymore by the guest.

Thanks!

Eric


>
>> *
>> * Given that we can't update the STE atomically and the SMMU
>> * doesn't read the thing in a defined order, that leaves us
>> @@ -1193,7 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>> * 3. Update Config, sync
>> */
>> u64 val = le64_to_cpu(dst[0]);
>> - bool ste_live = false;
>> + bool s1_live = false, s2_live = false, ste_live;
>> + bool abort, nested = false, translate = false;
>> struct arm_smmu_device *smmu = NULL;
>> struct arm_smmu_s1_cfg *s1_cfg;
>> struct arm_smmu_s2_cfg *s2_cfg;
>> @@ -1233,6 +1236,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>> default:
>> break;
>> }
>> + nested = s1_cfg->set && s2_cfg->set;
>> + translate = s1_cfg->set || s2_cfg->set;
>> }
>>
>> if (val & STRTAB_STE_0_V) {
>> @@ -1240,23 +1245,36 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>> case STRTAB_STE_0_CFG_BYPASS:
>> break;
>> case STRTAB_STE_0_CFG_S1_TRANS:
>> + s1_live = true;
>> + break;
>> case STRTAB_STE_0_CFG_S2_TRANS:
>> - ste_live = true;
>> + s2_live = true;
>> + break;
>> + case STRTAB_STE_0_CFG_NESTED:
>> + s1_live = true;
>> + s2_live = true;
>> break;
>> case STRTAB_STE_0_CFG_ABORT:
>> - BUG_ON(!disable_bypass);
>> break;
>> default:
>> BUG(); /* STE corruption */
>> }
>> }
>>
>> + ste_live = s1_live || s2_live;
>> +
>> /* Nuke the existing STE_0 value, as we're going to rewrite it */
>> val = STRTAB_STE_0_V;
>>
>> /* Bypass/fault */
>> - if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
>> - if (!smmu_domain && disable_bypass)
>> +
>> + if (!smmu_domain)
>> + abort = disable_bypass;
>> + else
>> + abort = smmu_domain->abort;
>> +
>> + if (abort || !translate) {
>> + if (abort)
>> val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
>> else
>> val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
>> @@ -1274,8 +1292,16 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>> return;
>> }
>>
>> + BUG_ON(ste_live && !nested);
>> +
>> + if (ste_live) {
>> + /* First invalidate the live STE */
>> + dst[0] = cpu_to_le64(STRTAB_STE_0_CFG_ABORT);
>> + arm_smmu_sync_ste_for_sid(smmu, sid);
>> + }
>> +
> [...]
>
> Thanks,
> Keqian
>

2021-02-21 18:25:17

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

Hi Shameer,
On 1/8/21 6:05 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: Eric Auger [mailto:[email protected]]
>> Sent: 18 November 2020 11:22
>> To: [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]
>> Cc: [email protected]; [email protected];
>> [email protected]; [email protected]; Shameerali Kolothum
>> Thodi <[email protected]>;
>> [email protected]; [email protected]; [email protected];
>> [email protected]; yuzenghui <[email protected]>
>> Subject: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
>>
>> This series brings the IOMMU part of HW nested paging support
>> in the SMMUv3. The VFIO part is submitted separately.
>>
>> The IOMMU API is extended to support 2 new API functionalities:
>> 1) pass the guest stage 1 configuration
>> 2) pass stage 1 MSI bindings
>>
>> Then those capabilities gets implemented in the SMMUv3 driver.
>>
>> The virtualizer passes information through the VFIO user API
>> which cascades them to the iommu subsystem. This allows the guest
>> to own stage 1 tables and context descriptors (so-called PASID
>> table) while the host owns stage 2 tables and main configuration
>> structures (STE).
>
> I am seeing an issue with Guest testpmd run with this series.
> I have two different setups and testpmd works fine with the
> first one but not with the second.
>
> 1). Guest doesn't have kernel driver built-in for pass-through dev.
>
> root@ubuntu:/# lspci -v
> ...
> 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 21)
> Subsystem: Huawei Technologies Co., Ltd. Device 0000
> Flags: fast devsel
> Memory at 8000100000 (64-bit, prefetchable) [disabled] [size=64K]
> Memory at 8000000000 (64-bit, prefetchable) [disabled] [size=1M]
> Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> Capabilities: [a0] MSI-X: Enable- Count=67 Masked-
> Capabilities: [b0] Power Management version 3
> Capabilities: [100] Access Control Services
> Capabilities: [300] Transaction Processing Hints
>
> root@ubuntu:/# echo vfio-pci > /sys/bus/pci/devices/0000:00:02.0/driver_override
> root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe
>
> root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix socket0 -l 0-1 -n 2 -- -i
> EAL: Detected 8 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-32768kB
> EAL: No available hugepages reported in hugepages-64kB
> EAL: No available hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL: Invalid NUMA socket, default to 0
> EAL: using IOMMU type 1 (Type 1)
> EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket 0)
> EAL: No legacy callbacks, legacy socket not created
> Interactive-mode selected
> testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
>
> Warning! port-topology=paired and odd forward ports number, the last port will pair with itself.
>
> Configuring Port 0 (socket 0)
> Port 0: 8E:A6:8C:43:43:45
> Checking link statuses...
> Done
> testpmd>
>
> 2). Guest have kernel driver built-in for pass-through dev.
>
> root@ubuntu:/# lspci -v
> ...
> 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 21)
> Subsystem: Huawei Technologies Co., Ltd. Device 0000
> Flags: bus master, fast devsel, latency 0
> Memory at 8000100000 (64-bit, prefetchable) [size=64K]
> Memory at 8000000000 (64-bit, prefetchable) [size=1M]
> Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> Capabilities: [a0] MSI-X: Enable+ Count=67 Masked-
> Capabilities: [b0] Power Management version 3
> Capabilities: [100] Access Control Services
> Capabilities: [300] Transaction Processing Hints
> Kernel driver in use: hns3
>
> root@ubuntu:/# echo vfio-pci > /sys/bus/pci/devices/0000:00:02.0/driver_override
> root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers/hns3/unbind
> root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe
>
> root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix socket0 -l 0-1 -n 2 -- -i
> EAL: Detected 8 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-32768kB
> EAL: No available hugepages reported in hugepages-64kB
> EAL: No available hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL: Invalid NUMA socket, default to 0
> EAL: using IOMMU type 1 (Type 1)
> EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket 0)
> 0000:00:02.0 hns3_get_mbx_resp(): VF could not get mbx(11,0) head(1) tail(0) lost(1) from PF in_irq:0
> hns3vf_get_queue_info(): Failed to get tqp info from PF: -62
> hns3vf_init_vf(): Failed to fetch configuration: -62
> hns3vf_dev_init(): Failed to init vf: -62
> EAL: Releasing pci mapped resource for 0000:00:02.0
> EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100800000
> EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100810000
> EAL: Requested device 0000:00:02.0 cannot be used
> EAL: Bus (pci) probe failed.
> EAL: No legacy callbacks, legacy socket not created
> testpmd: No probed ethernet devices
> Interactive-mode selected
> testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> Done
> testpmd>
>
> And in this case, smmu(host) reports a translation fault,
>
> [ 6542.670624] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
> [ 6542.670630] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00007d1200000010
> [ 6542.670631] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000012000000007c
> [ 6542.670633] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef040
> [ 6542.670634] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef000
>
> Tested with Intel 82599 card(ixgbevf) as well. but same errror.

So this should be fixed in the next release. The problem came from the
fact the MSI giova was not duly unregistered. When vfio is not in used
on guest side, the guest kernel allocates giovas for MSIs @fffef000 - 40
is the ITS translater offset ;-) - When passthrough is in use, the iova
is allocated @0x8000000. As fffef000 MSI giova was not properly
unregistered, the host kernel used it - despite it has been unmapped by
the guest kernel -, hence the translation fault. So the fix is to
unregister the MSI in the VFIO QEMU code when msix are disabled. So to
me this is a QEMU integration issue.

Thank you very much for testing and reporting!

Thanks

Eric
>
> Not able to root cause the problem yet. With the hope that, this is
> related to tlb entries not being invlaidated properly, I tried explicitly
> issuing CMD_TLBI_NSNH_ALL and CMD_CFGI_CD_ALL just before
> the STE update, but no luck yet :(
>
> Please let me know if I am missing something here or has any clue if you
> can replicate this on your setup.
>
> Thanks,
> Shameer
>
>>
>> Best Regards
>>
>> Eric
>>
>> This series can be found at:
>> https://github.com/eauger/linux/tree/5.10-rc4-2stage-v13
>> (including the VFIO part in his last version: v11)
>>
>> The series includes a patch from Jean-Philippe. It is better to
>> review the original patch:
>> [PATCH v8 2/9] iommu/arm-smmu-v3: Maintain a SID->device structure
>>
>> The VFIO series is sent separately.
>>
>> History:
>>
>> v12 -> v13:
>> - fixed compilation issue with CONFIG_ARM_SMMU_V3_SVA
>> reported by Shameer. This urged me to revisit patch 4 into
>> iommu/smmuv3: Allow s1 and s2 configs to coexist where
>> s1_cfg and s2_cfg are not dynamically allocated anymore.
>> Instead I use a new set field in existing structs
>> - fixed 2 others config checks
>> - Updated "iommu/arm-smmu-v3: Maintain a SID->device structure"
>> according to the last version
>>
>> v11 -> v12:
>> - rebase on top of v5.10-rc4
>>
>> Eric Auger (14):
>> iommu: Introduce attach/detach_pasid_table API
>> iommu: Introduce bind/unbind_guest_msi
>> iommu/smmuv3: Allow s1 and s2 configs to coexist
>> iommu/smmuv3: Get prepared for nested stage support
>> iommu/smmuv3: Implement attach/detach_pasid_table
>> iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
>> iommu/smmuv3: Implement cache_invalidate
>> dma-iommu: Implement NESTED_MSI cookie
>> iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
>> iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI
>> regions
>> iommu/smmuv3: Implement bind/unbind_guest_msi
>> iommu/smmuv3: Report non recoverable faults
>> iommu/smmuv3: Accept configs with more than one context descriptor
>> iommu/smmuv3: Add PASID cache invalidation per PASID
>>
>> Jean-Philippe Brucker (1):
>> iommu/arm-smmu-v3: Maintain a SID->device structure
>>
>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 659
>> ++++++++++++++++++--
>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 103 ++-
>> drivers/iommu/dma-iommu.c | 142 ++++-
>> drivers/iommu/iommu.c | 105 ++++
>> include/linux/dma-iommu.h | 16 +
>> include/linux/iommu.h | 41 ++
>> include/uapi/linux/iommu.h | 54 ++
>> 7 files changed, 1042 insertions(+), 78 deletions(-)
>>
>> --
>> 2.21.3
>

Subject: RE: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)



> -----Original Message-----
> From: Auger Eric [mailto:[email protected]]
> Sent: 21 February 2021 18:21
> To: Shameerali Kolothum Thodi <[email protected]>;
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Cc: [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; yuzenghui <[email protected]>; Zengtao (B)
> <[email protected]>; [email protected]
> Subject: Re: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
>
> Hi Shameer,
> On 1/8/21 6:05 PM, Shameerali Kolothum Thodi wrote:
> > Hi Eric,
> >
> >> -----Original Message-----
> >> From: Eric Auger [mailto:[email protected]]
> >> Sent: 18 November 2020 11:22
> >> To: [email protected]; [email protected];
> >> [email protected]; [email protected];
> >> [email protected]; [email protected]; [email protected];
> >> [email protected]; [email protected]; [email protected];
> >> [email protected]
> >> Cc: [email protected]; [email protected];
> >> [email protected]; [email protected]; Shameerali Kolothum
> >> Thodi <[email protected]>;
> >> [email protected]; [email protected]; [email protected];
> >> [email protected]; yuzenghui <[email protected]>
> >> Subject: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
> >>
> >> This series brings the IOMMU part of HW nested paging support
> >> in the SMMUv3. The VFIO part is submitted separately.
> >>
> >> The IOMMU API is extended to support 2 new API functionalities:
> >> 1) pass the guest stage 1 configuration
> >> 2) pass stage 1 MSI bindings
> >>
> >> Then those capabilities gets implemented in the SMMUv3 driver.
> >>
> >> The virtualizer passes information through the VFIO user API
> >> which cascades them to the iommu subsystem. This allows the guest
> >> to own stage 1 tables and context descriptors (so-called PASID
> >> table) while the host owns stage 2 tables and main configuration
> >> structures (STE).
> >
> > I am seeing an issue with Guest testpmd run with this series.
> > I have two different setups and testpmd works fine with the
> > first one but not with the second.
> >
> > 1). Guest doesn't have kernel driver built-in for pass-through dev.
> >
> > root@ubuntu:/# lspci -v
> > ...
> > 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev
> 21)
> > Subsystem: Huawei Technologies Co., Ltd. Device 0000
> > Flags: fast devsel
> > Memory at 8000100000 (64-bit, prefetchable) [disabled] [size=64K]
> > Memory at 8000000000 (64-bit, prefetchable) [disabled] [size=1M]
> > Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> > Capabilities: [a0] MSI-X: Enable- Count=67 Masked-
> > Capabilities: [b0] Power Management version 3
> > Capabilities: [100] Access Control Services
> > Capabilities: [300] Transaction Processing Hints
> >
> > root@ubuntu:/# echo vfio-pci >
> /sys/bus/pci/devices/0000:00:02.0/driver_override
> > root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe
> >
> > root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix
> socket0 -l 0-1 -n 2 -- -i
> > EAL: Detected 8 lcore(s)
> > EAL: Detected 1 NUMA nodes
> > EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> > EAL: Selected IOVA mode 'VA'
> > EAL: No available hugepages reported in hugepages-32768kB
> > EAL: No available hugepages reported in hugepages-64kB
> > EAL: No available hugepages reported in hugepages-1048576kB
> > EAL: Probing VFIO support...
> > EAL: VFIO support initialized
> > EAL: Invalid NUMA socket, default to 0
> > EAL: using IOMMU type 1 (Type 1)
> > EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket
> 0)
> > EAL: No legacy callbacks, legacy socket not created
> > Interactive-mode selected
> > testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456,
> size=2176, socket=0
> > testpmd: preferred mempool ops selected: ring_mp_mc
> >
> > Warning! port-topology=paired and odd forward ports number, the last port
> will pair with itself.
> >
> > Configuring Port 0 (socket 0)
> > Port 0: 8E:A6:8C:43:43:45
> > Checking link statuses...
> > Done
> > testpmd>
> >
> > 2). Guest have kernel driver built-in for pass-through dev.
> >
> > root@ubuntu:/# lspci -v
> > ...
> > 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev
> 21)
> > Subsystem: Huawei Technologies Co., Ltd. Device 0000
> > Flags: bus master, fast devsel, latency 0
> > Memory at 8000100000 (64-bit, prefetchable) [size=64K]
> > Memory at 8000000000 (64-bit, prefetchable) [size=1M]
> > Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> > Capabilities: [a0] MSI-X: Enable+ Count=67 Masked-
> > Capabilities: [b0] Power Management version 3
> > Capabilities: [100] Access Control Services
> > Capabilities: [300] Transaction Processing Hints
> > Kernel driver in use: hns3
> >
> > root@ubuntu:/# echo vfio-pci >
> /sys/bus/pci/devices/0000:00:02.0/driver_override
> > root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers/hns3/unbind
> > root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe
> >
> > root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix
> socket0 -l 0-1 -n 2 -- -i
> > EAL: Detected 8 lcore(s)
> > EAL: Detected 1 NUMA nodes
> > EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> > EAL: Selected IOVA mode 'VA'
> > EAL: No available hugepages reported in hugepages-32768kB
> > EAL: No available hugepages reported in hugepages-64kB
> > EAL: No available hugepages reported in hugepages-1048576kB
> > EAL: Probing VFIO support...
> > EAL: VFIO support initialized
> > EAL: Invalid NUMA socket, default to 0
> > EAL: using IOMMU type 1 (Type 1)
> > EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket
> 0)
> > 0000:00:02.0 hns3_get_mbx_resp(): VF could not get mbx(11,0) head(1) tail(0)
> lost(1) from PF in_irq:0
> > hns3vf_get_queue_info(): Failed to get tqp info from PF: -62
> > hns3vf_init_vf(): Failed to fetch configuration: -62
> > hns3vf_dev_init(): Failed to init vf: -62
> > EAL: Releasing pci mapped resource for 0000:00:02.0
> > EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100800000
> > EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100810000
> > EAL: Requested device 0000:00:02.0 cannot be used
> > EAL: Bus (pci) probe failed.
> > EAL: No legacy callbacks, legacy socket not created
> > testpmd: No probed ethernet devices
> > Interactive-mode selected
> > testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456,
> size=2176, socket=0
> > testpmd: preferred mempool ops selected: ring_mp_mc
> > Done
> > testpmd>
> >
> > And in this case, smmu(host) reports a translation fault,
> >
> > [ 6542.670624] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
> > [ 6542.670630] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00007d1200000010
> > [ 6542.670631] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000012000000007c
> > [ 6542.670633] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef040
> > [ 6542.670634] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef000
> >
> > Tested with Intel 82599 card(ixgbevf) as well. but same errror.
>
> So this should be fixed in the next release. The problem came from the
> fact the MSI giova was not duly unregistered. When vfio is not in used
> on guest side, the guest kernel allocates giovas for MSIs @fffef000 - 40
> is the ITS translater offset ;-) - When passthrough is in use, the iova
> is allocated @0x8000000. As fffef000 MSI giova was not properly
> unregistered, the host kernel used it - despite it has been unmapped by
> the guest kernel -, hence the translation fault. So the fix is to
> unregister the MSI in the VFIO QEMU code when msix are disabled. So to
> me this is a QEMU integration issue.

Super!. I was focusing on the TLBI side and was slightly worried it is somehow
related our specific hardware. That’s a relief :).

Thanks,
Shameer


2021-03-15 23:12:25

by Krishna Reddy

[permalink] [raw]
Subject: RE: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

Tested-by: Krishna Reddy <[email protected]>

> 1) pass the guest stage 1 configuration

Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM along with patch series "v11 SMMUv3 Nested Stage Setup (VFIO part)" and QEMU patch series "vSMMUv3/pSMMUv3 2 stage VFIO integration" from v5.2.0-2stage-rfcv8.
NVMe PCIe device is functional with 2-stage translations and no issues observed.

-KR

2021-03-16 09:36:12

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

Hi Krishna,
On 3/15/21 7:04 PM, Krishna Reddy wrote:
> Tested-by: Krishna Reddy <[email protected]>
>
>> 1) pass the guest stage 1 configuration
>
> Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM along with patch series "v11 SMMUv3 Nested Stage Setup (VFIO part)" and QEMU patch series "vSMMUv3/pSMMUv3 2 stage VFIO integration" from v5.2.0-2stage-rfcv8.
> NVMe PCIe device is functional with 2-stage translations and no issues observed.
Thank you very much for your testing efforts. For your info, there are
more recent kernel series:
[PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) (Feb 23)
[PATCH v12 00/13] SMMUv3 Nested Stage Setup (VFIO part) (Feb 23)

working along with QEMU RFC
[RFC v8 00/28] vSMMUv3/pSMMUv3 2 stage VFIO integration (Feb 25)

If you have cycles to test with those, this would be higly appreciated.

Thanks

Eric
>
> -KR
>

2021-03-16 21:21:49

by Krishna Reddy

[permalink] [raw]
Subject: RE: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

> Hi Krishna,
> On 3/15/21 7:04 PM, Krishna Reddy wrote:
> > Tested-by: Krishna Reddy <[email protected]>
> >
> >> 1) pass the guest stage 1 configuration
> >
> > Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM
> along with patch series "v11 SMMUv3 Nested Stage Setup (VFIO part)" and
> QEMU patch series "vSMMUv3/pSMMUv3 2 stage VFIO integration" from
> v5.2.0-2stage-rfcv8.
> > NVMe PCIe device is functional with 2-stage translations and no issues
> observed.
> Thank you very much for your testing efforts. For your info, there are more
> recent kernel series:
> [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) (Feb 23) [PATCH
> v12 00/13] SMMUv3 Nested Stage Setup (VFIO part) (Feb 23)
>
> working along with QEMU RFC
> [RFC v8 00/28] vSMMUv3/pSMMUv3 2 stage VFIO integration (Feb 25)
>
> If you have cycles to test with those, this would be higly appreciated.

Thanks Eric for the latest patches. Will validate and update.
Feel free to reach out me for validating future patch sets as necessary.

-KR