2024-04-30 04:44:48

by Nicolin Chen

[permalink] [raw]
Subject: [PATCH v6 0/6] Add Tegra241 (Grace) CMDQV Support (part 1/2)

NVIDIA's Tegra241 (Grace) SoC has a CMDQ-Virtualization (CMDQV) hardware
that extends standard ARM SMMUv3 to support multiple command queues with
virtualization capabilities. Though this is similar to the ECMDQ in SMMU
v3.3, CMDQV provides additional Virtual Interfaces (VINTFs) allowing VMs
to have their own VINTFs and Virtual Command Queues (VCMDQs). The VCMDQs
can only execute a limited set of commands, mainly invalidation commands
when exclusively used by the VMs, compared to the standard SMMUv3 CMDQ.

Thus, there are two parts of patch series to add its support: the basic
in-kernel support as part 1, and the user-space support as part 2.

The in-kernel support is to detect/configure the CMDQV hardware and then
allocate a VINTF with some VCMDQs for the kernel/hypervisor to use. Like
ECMDQ, CMDQV also allows the kernel to use multiple VCMDQs, giving some
limited performance improvement: up to 20% reduction of TLB invalidation
time was measured by a multi-threaded DMA unmap benchmark, compared to a
single queue.

The user-space support is to provide uAPIs (via IOMMUFD) for hypervisors
in user space to passthrough VCMDQs to VMs, allowing these VMs to access
the VCMDQs directly without trappings, i.e. no VM Exits. This gives huge
performance improvements: 70% to 90% reductions of TLB invalidation time
were measured by various DMA unmap tests running in a guest OS, compared
to a nested SMMU CMDQ (with trappings).

This is the part-1 series:
- Preparatory changes to share the existing SMMU functions
- A new CMDQV driver and extending the SMMUv3 driver to interact with
the new driver
- Limit the commands for a guest kernel.

It's available on Github:
https://github.com/nicolinc/iommufd/commits/vcmdq_in_kernel-v6

And the part-2 RFC series is also sent for discussion:
https://lore.kernel.org/all/[email protected]/

Note that this in-kernel support isn't confined to host kernels running
on Grace-powered servers, but is also used by guest kernels running on
VMs virtualized on those servers. So, those VMs must install the driver,
ideally before the part 2 is merged. So, later those servers would only
need to upgrade their host kernels without bothering the VMs.

Thank you!

Changelog
v6:
* Reordered the patch sequence to fix git-bisect break
* Added a status cache to cmdqv/vintf/vcmdq structure
* Added gerror/gerrorn value match in hw_deinit()
* Minimized changes in __arm_smmu_cmdq_skip_err()
* Preallocated VCMDQs to VINTFs for stablility
v5:
https://lore.kernel.org/all/[email protected]/
* Improved print/mmio helpers
* Added proper register reset routines
* Reorganized init/deinit functions to share with VIOMMU callbacks in
the upcoming part-2 user-space series (RFC)
v4:
https://lore.kernel.org/all/[email protected]/
* Rebased on v6.9-rc1
* Renamed to "tegra241-cmdqv", following other Grace kernel patches
* Added a set of print and MMIO helpers
* Reworked the guest limitation patch
v3:
https://lore.kernel.org/all/[email protected]/
* Dropped VMID and mdev patches to redesign later based on IOMMUFD
* Separated HYP_OWN part for guest support into a new patch
* Added new preparatory changes
v2:
https://lore.kernel.org/all/[email protected]/
* Added mdev interface support for hypervisor and VMs
* Added preparatory changes for mdev interface implementation
* PATCH-12 Changed ->issue_cmdlist() to ->get_cmdq() for a better
integration with recently merged ECMDQ-related changes
v1:
https://lore.kernel.org/all/[email protected]/

Nate Watterson (1):
iommu/arm-smmu-v3: Add in-kernel support for NVIDIA Tegra241 (Grace)
CMDQV

Nicolin Chen (5):
iommu/arm-smmu-v3: Pass in cmdq pointer to
arm_smmu_cmdq_issue_cmdlist()
iommu/arm-smmu-v3: Add CS_NONE quirk
iommu/arm-smmu-v3: Make arm_smmu_cmdq_init reusable
iommu/arm-smmu-v3: Make __arm_smmu_cmdq_skip_err reusable
iommu/tegra241-cmdqv: Limit CMDs for guest owned VINTF

MAINTAINERS | 1 +
drivers/iommu/Kconfig | 12 +
drivers/iommu/arm/arm-smmu-v3/Makefile | 1 +
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 65 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 47 +
.../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 857 ++++++++++++++++++
6 files changed, 961 insertions(+), 22 deletions(-)
create mode 100644 drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c

--
2.43.0



2024-04-30 04:44:54

by Nicolin Chen

[permalink] [raw]
Subject: [PATCH v6 3/6] iommu/arm-smmu-v3: Make arm_smmu_cmdq_init reusable

The CMDQV extension in NVIDIA Tegra241 SoC resues the arm_smmu_cmdq
structure while the queue location isn't same as smmu->cmdq.

Add a cmdq argument to arm_smmu_cmdq_init() function and shares its
define in the header for CMDQV driver to use.

Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 6 +++---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 +++
2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index b3d03ca01adc..538850059bdd 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3145,9 +3145,9 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
return 0;
}

-static int arm_smmu_cmdq_init(struct arm_smmu_device *smmu)
+int arm_smmu_cmdq_init(struct arm_smmu_device *smmu,
+ struct arm_smmu_cmdq *cmdq)
{
- struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
unsigned int nents = 1 << cmdq->q.llq.max_n_shift;

atomic_set(&cmdq->owner_prod, 0);
@@ -3172,7 +3172,7 @@ static int arm_smmu_init_queues(struct arm_smmu_device *smmu)
if (ret)
return ret;

- ret = arm_smmu_cmdq_init(smmu);
+ ret = arm_smmu_cmdq_init(smmu, &smmu->cmdq);
if (ret)
return ret;

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index bbee08e82943..ab2824e46ac5 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -760,6 +760,9 @@ bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd);
int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
unsigned long iova, size_t size);

+int arm_smmu_cmdq_init(struct arm_smmu_device *smmu,
+ struct arm_smmu_cmdq *cmdq);
+
#ifdef CONFIG_ARM_SMMU_V3_SVA
bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
bool arm_smmu_master_sva_supported(struct arm_smmu_master *master);
--
2.43.0


2024-04-30 04:45:17

by Nicolin Chen

[permalink] [raw]
Subject: [PATCH v6 4/6] iommu/arm-smmu-v3: Make __arm_smmu_cmdq_skip_err reusable

Allow __arm_smmu_cmdq_skip_err function to be reused by NVIDIA Tegra241
CMDQV unit since it will use the same data structure for q. And include
the CMDQ_QUIRK_SYNC_CS_NONE_ONLY quirk when inserting a CMD_SYNC.

Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 7 +++++--
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 ++
2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 538850059bdd..5111859347d5 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -379,8 +379,8 @@ static void arm_smmu_cmdq_build_sync_cmd(u64 *cmd, struct arm_smmu_device *smmu,
arm_smmu_cmdq_build_cmd(cmd, &ent);
}

-static void __arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu,
- struct arm_smmu_queue *q)
+void __arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu,
+ struct arm_smmu_queue *q)
{
static const char * const cerror_str[] = {
[CMDQ_ERR_CERROR_NONE_IDX] = "No error",
@@ -428,6 +428,9 @@ static void __arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu,
for (i = 0; i < ARRAY_SIZE(cmd); ++i)
dev_err(smmu->dev, "\t0x%016llx\n", (unsigned long long)cmd[i]);

+ if (q->quirks & CMDQ_QUIRK_SYNC_CS_NONE_ONLY)
+ cmd_sync.sync.cs_none = true;
+
/* Convert the erroneous command into a CMD_SYNC */
arm_smmu_cmdq_build_cmd(cmd, &cmd_sync);

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index ab2824e46ac5..32e7fc5e1794 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -762,6 +762,8 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,

int arm_smmu_cmdq_init(struct arm_smmu_device *smmu,
struct arm_smmu_cmdq *cmdq);
+void __arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu,
+ struct arm_smmu_queue *q);

#ifdef CONFIG_ARM_SMMU_V3_SVA
bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
--
2.43.0


2024-04-30 04:45:38

by Nicolin Chen

[permalink] [raw]
Subject: [PATCH v6 5/6] iommu/arm-smmu-v3: Add in-kernel support for NVIDIA Tegra241 (Grace) CMDQV

From: Nate Watterson <[email protected]>

NVIDIA's Tegra241 Soc has a CMDQ-Virtualization (CMDQV) hardware, extending
the standard ARM SMMU v3 IP to support multiple VCMDQs with virtualization
capabilities. In terms of command queue, they are very like a standard SMMU
CMDQ (or ECMDQs), but only support CS_NONE in the CS field of CMD_SYNC.

Add a new tegra241-cmdqv driver, and insert its structure pointer into the
existing arm_smmu_device, and then add related function calls in the SMMUv3
driver to interact with the CMDQV driver.

In the CMDQV driver, add a minimal part for the in-kernel support: reserve
VINTF0 for in-kernel use, and assign some of the VCMDQs to the VINTF0, and
select one VCMDQ based on the current CPU ID to execute supported commands.
This multi-queue design for in-kernel use gives some limited improvements:
up to 20% reduction of invalidation time was measured by a multi-threaded
DMA unmap benchmark, compared to a single queue.

The other part of the CMDQV driver will be user-space support that gives a
hypervisor running on the host OS to talk to the driver for virtualization
use cases, allowing VMs to use VCMDQs without trappings, i.e. no VM Exits.
This is currently WIP based on IOMMUFD, and will be sent for review after
SMMU nesting patches are getting merged. This part will provide a guest OS
a bigger improvement: 70% to 90% reductions of TLB invalidation time were
measured by DMA unmap tests running in a guest OS, compared to nested SMMU
CMDQ (with trappings).

However, it is very important for this in-kernel support to get merged and
installed to VMs running on Grace-powered servers as soon as possible. So,
later those servers would only need to upgrade their host kernels for the
user-space support.

As the initial version, the CMDQV driver only supports ACPI configurations.

Signed-off-by: Nate Watterson <[email protected]>
Co-developed-by: Nicolin Chen <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
MAINTAINERS | 1 +
drivers/iommu/Kconfig | 12 +
drivers/iommu/arm/arm-smmu-v3/Makefile | 1 +
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 37 +
.../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 815 ++++++++++++++++++
6 files changed, 882 insertions(+), 6 deletions(-)
create mode 100644 drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c

diff --git a/MAINTAINERS b/MAINTAINERS
index f6dc90559341..8a799dbc300b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -21742,6 +21742,7 @@ M: Thierry Reding <[email protected]>
R: Krishna Reddy <[email protected]>
L: [email protected]
S: Supported
+F: drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
F: drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
F: drivers/iommu/tegra*

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 0af39bbbe3a3..82e557de31e3 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -410,6 +410,18 @@ config ARM_SMMU_V3_SVA
Say Y here if your system supports SVA extensions such as PCIe PASID
and PRI.

+config TEGRA241_CMDQV
+ bool "NVIDIA Tegra241 CMDQ-V extension support for ARM SMMUv3"
+ depends on ARM_SMMU_V3
+ depends on ACPI
+ help
+ Support for NVIDIA CMDQ-Virtualization extension for ARM SMMUv3. The
+ CMDQ-V extension is similar to v3.3 ECMDQ for multi command queues
+ support, except with virtualization capabilities.
+
+ Say Y here if your system is NVIDIA Tegra241 (Grace) or it has the same
+ CMDQ-V extension.
+
config S390_IOMMU
def_bool y if S390 && PCI
depends on S390 && PCI
diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
index 54feb1ecccad..8dff2bc4c7f3 100644
--- a/drivers/iommu/arm/arm-smmu-v3/Makefile
+++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
@@ -2,4 +2,5 @@
obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o
arm_smmu_v3-objs-y += arm-smmu-v3.o
arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
+arm_smmu_v3-objs-$(CONFIG_TEGRA241_CMDQV) += tegra241-cmdqv.o
arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 5111859347d5..665a5e585f72 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -354,6 +354,9 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)

static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smmu)
{
+ if (smmu->tegra241_cmdqv)
+ return tegra241_cmdqv_get_cmdq(smmu);
+
return &smmu->cmdq;
}

@@ -3105,12 +3108,10 @@ static struct iommu_ops arm_smmu_ops = {
};

/* Probing and initialisation functions */
-static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
- struct arm_smmu_queue *q,
- void __iomem *page,
- unsigned long prod_off,
- unsigned long cons_off,
- size_t dwords, const char *name)
+int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
+ struct arm_smmu_queue *q, void __iomem *page,
+ unsigned long prod_off, unsigned long cons_off,
+ size_t dwords, const char *name)
{
size_t qsz;

@@ -3567,6 +3568,12 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
return ret;
}

+ if (smmu->tegra241_cmdqv) {
+ ret = tegra241_cmdqv_device_reset(smmu);
+ if (ret)
+ return ret;
+ }
+
/* Invalidate any cached configuration */
cmd.opcode = CMDQ_OP_CFGI_ALL;
arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
@@ -3941,6 +3948,9 @@ static int arm_smmu_device_acpi_probe(struct platform_device *pdev,
if (iort_smmu->flags & ACPI_IORT_SMMU_V3_COHACC_OVERRIDE)
smmu->features |= ARM_SMMU_FEAT_COHERENCY;

+ smmu->tegra241_cmdqv =
+ tegra241_cmdqv_acpi_probe(smmu, node->identifier);
+
return 0;
}
#else
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 32e7fc5e1794..87e4c227a937 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -626,6 +626,8 @@ struct arm_smmu_strtab_cfg {
u32 strtab_base_cfg;
};

+struct tegra241_cmdqv;
+
/* An SMMUv3 instance */
struct arm_smmu_device {
struct device *dev;
@@ -689,6 +691,12 @@ struct arm_smmu_device {

struct rb_root streams;
struct mutex streams_mutex;
+
+ /*
+ * Pointer to NVIDIA Tegra241 CMDQ-Virtualization Extension support,
+ * similar to v3.3 ECMDQ except with virtualization capabilities.
+ */
+ struct tegra241_cmdqv *tegra241_cmdqv;
};

struct arm_smmu_stream {
@@ -764,6 +772,10 @@ int arm_smmu_cmdq_init(struct arm_smmu_device *smmu,
struct arm_smmu_cmdq *cmdq);
void __arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu,
struct arm_smmu_queue *q);
+int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
+ struct arm_smmu_queue *q, void __iomem *page,
+ unsigned long prod_off, unsigned long cons_off,
+ size_t dwords, const char *name);

#ifdef CONFIG_ARM_SMMU_V3_SVA
bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
@@ -820,4 +832,29 @@ static inline void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain,
{
}
#endif /* CONFIG_ARM_SMMU_V3_SVA */
+
+#ifdef CONFIG_TEGRA241_CMDQV
+struct tegra241_cmdqv *
+tegra241_cmdqv_acpi_probe(struct arm_smmu_device *smmu, int id);
+int tegra241_cmdqv_device_reset(struct arm_smmu_device *smmu);
+struct arm_smmu_cmdq *tegra241_cmdqv_get_cmdq(struct arm_smmu_device *smmu);
+#else /* CONFIG_TEGRA241_CMDQV */
+static inline struct tegra241_cmdqv *
+tegra241_cmdqv_acpi_probe(struct arm_smmu_device *smmu, int id)
+{
+ return NULL;
+}
+
+static inline int tegra241_cmdqv_device_reset(struct arm_smmu_device *smmu)
+{
+ return -ENODEV;
+}
+
+static inline struct arm_smmu_cmdq *
+tegra241_cmdqv_get_cmdq(struct arm_smmu_device *smmu)
+{
+ return NULL;
+}
+#endif /* CONFIG_TEGRA241_CMDQV */
+
#endif /* _ARM_SMMU_V3_H */
diff --git a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
new file mode 100644
index 000000000000..4b2af3aaa6b4
--- /dev/null
+++ b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
@@ -0,0 +1,815 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (C) 2021-2024 NVIDIA CORPORATION & AFFILIATES. */
+
+#define dev_fmt(fmt) "tegra241_cmdqv: " fmt
+
+#include <linux/acpi.h>
+#include <linux/debugfs.h>
+#include <linux/dma-mapping.h>
+#include <linux/interrupt.h>
+#include <linux/iommu.h>
+#include <linux/iopoll.h>
+
+#include <acpi/acpixf.h>
+
+#include "arm-smmu-v3.h"
+
+#define TEGRA241_CMDQV_HID "NVDA200C"
+
+/* CMDQV register page base and size defines */
+#define TEGRA241_CMDQV_CONFIG_BASE (0)
+#define TEGRA241_CMDQV_CONFIG_SIZE (SZ_64K)
+#define TEGRA241_VCMDQ_PAGE0_BASE (TEGRA241_CMDQV_CONFIG_BASE + SZ_64K)
+#define TEGRA241_VCMDQ_PAGE1_BASE (TEGRA241_VCMDQ_PAGE0_BASE + SZ_64K)
+#define TEGRA241_VINTF_PAGE_BASE (TEGRA241_VCMDQ_PAGE1_BASE + SZ_64K)
+
+/* CMDQV global config regs */
+#define TEGRA241_CMDQV_CONFIG 0x0000
+#define CMDQV_EN BIT(0)
+
+#define TEGRA241_CMDQV_PARAM 0x0004
+#define CMDQV_NUM_VINTF_LOG2 GENMASK(11, 8)
+#define CMDQV_NUM_VCMDQ_LOG2 GENMASK(7, 4)
+
+#define TEGRA241_CMDQV_STATUS 0x0008
+#define CMDQV_ENABLED BIT(0)
+
+#define TEGRA241_CMDQV_VINTF_ERR_MAP 0x0014
+#define TEGRA241_CMDQV_VINTF_INT_MASK 0x001C
+#define TEGRA241_CMDQV_VCMDQ_ERR_MAP0 0x0024
+#define TEGRA241_CMDQV_VCMDQ_ERR_MAP(i) (0x0024 + 0x4*(i))
+
+#define TEGRA241_CMDQV_CMDQ_ALLOC(q) (0x0200 + 0x4*(q))
+#define CMDQV_CMDQ_ALLOC_VINTF GENMASK(20, 15)
+#define CMDQV_CMDQ_ALLOC_LVCMDQ GENMASK(7, 1)
+#define CMDQV_CMDQ_ALLOCATED BIT(0)
+
+/* VINTF config regs */
+#define TEGRA241_VINTF(v) (0x1000 + 0x100*(v))
+
+#define TEGRA241_VINTF_CONFIG 0x0000
+#define VINTF_HYP_OWN BIT(17)
+#define VINTF_VMID GENMASK(16, 1)
+#define VINTF_EN BIT(0)
+
+#define TEGRA241_VINTF_STATUS 0x0004
+#define VINTF_STATUS GENMASK(3, 1)
+#define VINTF_ENABLED BIT(0)
+
+#define TEGRA241_VINTF_CMDQ_ERR_MAP(m) (0x00C0 + 0x4*(m))
+
+/* VCMDQ config regs */
+/* -- PAGE0 -- */
+#define TEGRA241_VCMDQ_PAGE0(q) (TEGRA241_VCMDQ_PAGE0_BASE + 0x80*(q))
+
+#define TEGRA241_VCMDQ_CONS 0x00000
+#define VCMDQ_CONS_ERR GENMASK(30, 24)
+
+#define TEGRA241_VCMDQ_PROD 0x00004
+
+#define TEGRA241_VCMDQ_CONFIG 0x00008
+#define VCMDQ_EN BIT(0)
+
+#define TEGRA241_VCMDQ_STATUS 0x0000C
+#define VCMDQ_ENABLED BIT(0)
+
+#define TEGRA241_VCMDQ_GERROR 0x00010
+#define TEGRA241_VCMDQ_GERRORN 0x00014
+
+/* -- PAGE1 -- */
+#define TEGRA241_VCMDQ_PAGE1(q) (TEGRA241_VCMDQ_PAGE1_BASE + 0x80*(q))
+#define VCMDQ_ADDR GENMASK(47, 5)
+#define VCMDQ_LOG2SIZE GENMASK(4, 0)
+
+#define TEGRA241_VCMDQ_BASE 0x00000
+#define TEGRA241_VCMDQ_CONS_INDX_BASE 0x00008
+
+/* VINTF logical-VCMDQ pages */
+#define TEGRA241_VINTFi_PAGE0(i) (TEGRA241_VINTF_PAGE_BASE + SZ_128K*(i))
+#define TEGRA241_VINTFi_PAGE1(i) (TEGRA241_VINTFi_PAGE0(i) + SZ_64K)
+#define TEGRA241_VINTFi_LVCMDQ_PAGE0(i, q) \
+ (TEGRA241_VINTFi_PAGE0(i) + 0x80*(q))
+#define TEGRA241_VINTFi_LVCMDQ_PAGE1(i, q) \
+ (TEGRA241_VINTFi_PAGE1(i) + 0x80*(q))
+
+/* MMIO helpers */
+#define cmdqv_readl(reg) \
+ readl(cmdqv->base + TEGRA241_CMDQV_##reg)
+#define cmdqv_readl_relaxed(reg) \
+ readl_relaxed(cmdqv->base + TEGRA241_CMDQV_##reg)
+#define cmdqv_writel(val, reg) \
+ writel((val), cmdqv->base + TEGRA241_CMDQV_##reg)
+#define cmdqv_writel_relaxed(val, reg) \
+ writel_relaxed((val), cmdqv->base + TEGRA241_CMDQV_##reg)
+
+#define vintf_readl(reg) \
+ readl(vintf->base + TEGRA241_VINTF_##reg)
+#define vintf_readl_relaxed(reg) \
+ readl_relaxed(vintf->base + TEGRA241_VINTF_##reg)
+#define vintf_writel(val, reg) \
+ writel((val), vintf->base + TEGRA241_VINTF_##reg)
+#define vintf_writel_relaxed(val, reg) \
+ writel_relaxed((val), vintf->base + TEGRA241_VINTF_##reg)
+
+#define vcmdq_page0_readl(reg) \
+ readl(vcmdq->page0 + TEGRA241_VCMDQ_##reg)
+#define vcmdq_page0_readl_relaxed(reg) \
+ readl_relaxed(vcmdq->page0 + TEGRA241_VCMDQ_##reg)
+#define vcmdq_page0_writel(val, reg) \
+ writel((val), vcmdq->page0 + TEGRA241_VCMDQ_##reg)
+#define vcmdq_page0_writel_relaxed(val, reg) \
+ writel_relaxed((val), vcmdq->page0 + TEGRA241_VCMDQ_##reg)
+
+#define vcmdq_page1_readl(reg) \
+ readl(vcmdq->page1 + TEGRA241_VCMDQ_##reg)
+#define vcmdq_page1_readl_relaxed(reg) \
+ readl_relaxed(vcmdq->page1 + TEGRA241_VCMDQ_##reg)
+#define vcmdq_page1_readq_relaxed(reg) \
+ readq_relaxed(vcmdq->page1 + TEGRA241_VCMDQ_##reg)
+#define vcmdq_page1_writel(val, reg) \
+ writel((val), vcmdq->page1 + TEGRA241_VCMDQ_##reg)
+#define vcmdq_page1_writel_relaxed(val, reg) \
+ writel_relaxed((val), vcmdq->page1 + TEGRA241_VCMDQ_##reg)
+#define vcmdq_page1_writeq(val, reg) \
+ writeq((val), vcmdq->page1 + TEGRA241_VCMDQ_##reg)
+#define vcmdq_page1_writeq_relaxed(val, reg) \
+ writeq_relaxed((val), vcmdq->page1 + TEGRA241_VCMDQ_##reg)
+
+/* Logging helpers */
+#define cmdqv_warn(fmt, ...) \
+ dev_warn(cmdqv->dev, "CMDQV: " fmt, ##__VA_ARGS__)
+#define cmdqv_err(fmt, ...) \
+ dev_err(cmdqv->dev, "CMDQV: " fmt, ##__VA_ARGS__)
+#define cmdqv_info(fmt, ...) \
+ dev_info(cmdqv->dev, "CMDQV: " fmt, ##__VA_ARGS__)
+#define cmdqv_dbg(fmt, ...) \
+ dev_dbg(cmdqv->dev, "CMDQV: " fmt, ##__VA_ARGS__)
+
+#define vintf_warn(fmt, ...) \
+ dev_warn(vintf->cmdqv->dev, "VINTF%u: " fmt, vintf->idx, ##__VA_ARGS__)
+#define vintf_err(fmt, ...) \
+ dev_err(vintf->cmdqv->dev, "VINTF%u: " fmt, vintf->idx, ##__VA_ARGS__)
+#define vintf_info(fmt, ...) \
+ dev_info(vintf->cmdqv->dev, "VINTF%u: " fmt, vintf->idx, ##__VA_ARGS__)
+#define vintf_dbg(fmt, ...) \
+ dev_dbg(vintf->cmdqv->dev, "VINTF%u: " fmt, vintf->idx, ##__VA_ARGS__)
+
+#define vcmdq_warn(fmt, ...) \
+ ({ \
+ struct tegra241_vintf *vintf = vcmdq->vintf; \
+ if (vintf) \
+ vintf_warn("VCMDQ%u/LVCMDQ%u: " fmt, \
+ vcmdq->idx, vcmdq->lidx, \
+ ##__VA_ARGS__); \
+ else \
+ dev_warn(vcmdq->cmdqv->dev, "VCMDQ%u: " fmt, \
+ vcmdq->idx, ##__VA_ARGS__); \
+ })
+#define vcmdq_err(fmt, ...) \
+ ({ \
+ struct tegra241_vintf *vintf = vcmdq->vintf; \
+ if (vintf) \
+ vintf_err("VCMDQ%u/LVCMDQ%u: " fmt, \
+ vcmdq->idx, vcmdq->lidx, \
+ ##__VA_ARGS__); \
+ else \
+ dev_err(vcmdq->cmdqv->dev, "VCMDQ%u: " fmt, \
+ vcmdq->idx, ##__VA_ARGS__); \
+ })
+#define vcmdq_info(fmt, ...) \
+ ({ \
+ struct tegra241_vintf *vintf = vcmdq->vintf; \
+ if (vintf) \
+ vintf_info("VCMDQ%u/LVCMDQ%u: " fmt, \
+ vcmdq->idx, vcmdq->lidx, \
+ ##__VA_ARGS__); \
+ else \
+ dev_info(vcmdq->cmdqv->dev, "VCMDQ%u: " fmt, \
+ vcmdq->idx, ##__VA_ARGS__); \
+ })
+#define vcmdq_dbg(fmt, ...) \
+ ({ \
+ struct tegra241_vintf *vintf = vcmdq->vintf; \
+ if (vintf) \
+ vintf_dbg("VCMDQ%u/LVCMDQ%u: " fmt, \
+ vcmdq->idx, vcmdq->lidx, \
+ ##__VA_ARGS__); \
+ else \
+ dev_dbg(vcmdq->cmdqv->dev, "VCMDQ%u: " fmt, \
+ vcmdq->idx, ##__VA_ARGS__); \
+ })
+
+/* Configuring and polling helpers */
+#define tegra241_cmdqv_write_config(_owner, _OWNER, _regval) \
+ ({ \
+ bool _en = (_regval) & _OWNER##_EN; \
+ u32 _status; \
+ int _ret; \
+ writel((_regval), _owner->base + TEGRA241_##_OWNER##_CONFIG); \
+ _ret = readl_poll_timeout( \
+ _owner->base + TEGRA241_##_OWNER##_STATUS, _status, \
+ _en ? (_regval) & _OWNER##_ENABLED : \
+ !((_regval) & _OWNER##_ENABLED), \
+ 1, ARM_SMMU_POLL_TIMEOUT_US); \
+ if (_ret) \
+ _owner##_err("failed to %sable, STATUS = 0x%08X\n", \
+ _en ? "en" : "dis", _status); \
+ atomic_set(&_owner->status, _status); \
+ _ret; \
+ })
+
+#define cmdqv_write_config(_regval) \
+ tegra241_cmdqv_write_config(cmdqv, CMDQV, _regval)
+#define vintf_write_config(_regval) \
+ tegra241_cmdqv_write_config(vintf, VINTF, _regval)
+#define vcmdq_write_config(_regval) \
+ tegra241_cmdqv_write_config(vcmdq, VCMDQ, _regval)
+
+static bool disable_cmdqv;
+module_param(disable_cmdqv, bool, 0444);
+MODULE_PARM_DESC(disable_cmdqv,
+ "This allows to disable CMDQV HW and use default SMMU internal CMDQ.");
+
+static bool bypass_vcmdq;
+module_param(bypass_vcmdq, bool, 0444);
+MODULE_PARM_DESC(bypass_vcmdq,
+ "This allows to bypass VCMDQ for debugging use or perf comparison.");
+
+/**
+ * struct tegra241_vcmdq - Virtual Command Queue
+ * @idx: Global index in the CMDQV HW
+ * @lidx: Local index in the VINTF
+ * @status: cached status register
+ * @cmdqv: CMDQV HW pointer
+ * @vintf: VINTF HW pointer
+ * @cmdq: Command Queue struct
+ * @base: MMIO base address
+ * @page0: MMIO Page0 base address
+ * @page1: MMIO Page1 base address
+ */
+struct tegra241_vcmdq {
+ u16 idx;
+ u16 lidx;
+
+ atomic_t status;
+
+ struct tegra241_cmdqv *cmdqv;
+ struct tegra241_vintf *vintf;
+ struct arm_smmu_cmdq cmdq;
+
+ void __iomem *base;
+ void __iomem *page0;
+ void __iomem *page1;
+};
+
+/**
+ * struct tegra241_vintf - Virtual Interface
+ * @idx: Global index in the CMDQV HW
+ * @status: cached status register
+ * @cmdqv: CMDQV HW pointer
+ * @vcmdqs: List of VCMDQ pointers
+ * @base: MMIO base address
+ */
+struct tegra241_vintf {
+ u16 idx;
+
+ atomic_t status;
+
+ struct tegra241_cmdqv *cmdqv;
+ struct tegra241_vcmdq **vcmdqs;
+
+ void __iomem *base;
+};
+
+/**
+ * struct tegra241_cmdqv - CMDQ-V for SMMUv3
+ * @smmu: SMMUv3 pointer
+ * @dev: Device pointer
+ * @base: MMIO base address
+ * @irq: IRQ number
+ * @num_vintfs: Total number of VINTFs
+ * @num_vcmdqs: Total number of VCMDQs
+ * @num_vcmdqs_per_vintf: Number of VCMDQs per VINTF
+ * @status: cached status register
+ * @vintf_ids: VINTF id allocator
+ * @vcmdq_ids: VCMDQ id allocator
+ * @vtinfs: List of VINTFs
+ */
+struct tegra241_cmdqv {
+ struct arm_smmu_device *smmu;
+
+ struct device *dev;
+ void __iomem *base;
+ int irq;
+
+ /* CMDQV Hardware Params */
+ u16 num_vintfs;
+ u16 num_vcmdqs;
+ u16 num_vcmdqs_per_vintf;
+
+ atomic_t status;
+
+ struct ida vintf_ids;
+ struct ida vcmdq_ids;
+
+ struct tegra241_vintf **vintfs;
+};
+
+static void tegra241_cmdqv_handle_vintf0_error(struct tegra241_cmdqv *cmdqv)
+{
+ struct tegra241_vintf *vintf = cmdqv->vintfs[0];
+ int i;
+
+ /* Cache status to bypass VCMDQs until error is recovered */
+ atomic_set(&vintf->status, vintf_readl(STATUS));
+
+ for (i = 0; i < 4; i++) {
+ u32 lvcmdq_err_map = vintf_readl_relaxed(CMDQ_ERR_MAP(i));
+
+ while (lvcmdq_err_map) {
+ int lidx = ffs(lvcmdq_err_map) - 1;
+ struct tegra241_vcmdq *vcmdq = vintf->vcmdqs[lidx];
+ u32 gerrorn, gerror;
+
+ lvcmdq_err_map &= ~BIT(lidx);
+
+ __arm_smmu_cmdq_skip_err(cmdqv->smmu, &vcmdq->cmdq.q);
+
+ gerrorn = vcmdq_page0_readl_relaxed(GERRORN);
+ gerror = vcmdq_page0_readl_relaxed(GERROR);
+
+ vcmdq_page0_writel(gerror, GERRORN);
+ }
+ }
+
+ /* Now error status should be clean, cache it again */
+ atomic_set(&vintf->status, vintf_readl(STATUS));
+}
+
+static irqreturn_t tegra241_cmdqv_isr(int irq, void *devid)
+{
+ struct tegra241_cmdqv *cmdqv = (struct tegra241_cmdqv *)devid;
+ u32 vintf_errs[2];
+ u32 vcmdq_errs[4];
+
+ vintf_errs[0] = cmdqv_readl_relaxed(VINTF_ERR_MAP);
+ vintf_errs[1] = cmdqv_readl_relaxed(VINTF_ERR_MAP + 0x4);
+
+ vcmdq_errs[0] = cmdqv_readl_relaxed(VCMDQ_ERR_MAP(0));
+ vcmdq_errs[1] = cmdqv_readl_relaxed(VCMDQ_ERR_MAP(1));
+ vcmdq_errs[2] = cmdqv_readl_relaxed(VCMDQ_ERR_MAP(2));
+ vcmdq_errs[3] = cmdqv_readl_relaxed(VCMDQ_ERR_MAP(3));
+
+ cmdqv_warn("unexpected cmdqv error reported\n");
+ cmdqv_warn(" vintf_map: 0x%08X%08X\n", vintf_errs[1], vintf_errs[0]);
+ cmdqv_warn(" vcmdq_map: 0x%08X%08X%08X%08X\n",
+ vcmdq_errs[3], vcmdq_errs[2], vcmdq_errs[1], vcmdq_errs[0]);
+
+ /* Handle VINTF0 and its VCMDQs */
+ if (vintf_errs[0] & 0x1)
+ tegra241_cmdqv_handle_vintf0_error(cmdqv);
+
+ return IRQ_HANDLED;
+}
+
+struct arm_smmu_cmdq *tegra241_cmdqv_get_cmdq(struct arm_smmu_device *smmu)
+{
+ struct tegra241_cmdqv *cmdqv = smmu->tegra241_cmdqv;
+ struct tegra241_vintf *vintf = cmdqv->vintfs[0];
+ struct tegra241_vcmdq *vcmdq;
+ u16 lidx;
+
+ if (bypass_vcmdq)
+ return &smmu->cmdq;
+
+ /* Use SMMU CMDQ if vintfs[0] is uninitialized */
+ if (!FIELD_GET(VINTF_ENABLED, atomic_read(&vintf->status)))
+ return &smmu->cmdq;
+
+ /* Use SMMU CMDQ if vintfs[0] has error status */
+ if (FIELD_GET(VINTF_STATUS, atomic_read(&vintf->status)))
+ return &smmu->cmdq;
+
+ /*
+ * Select a vcmdq to use. Here we use a temporal solution to
+ * balance out traffic on cmdq issuing: each cmdq has its own
+ * lock, if all cpus issue cmdlist using the same cmdq, only
+ * one CPU at a time can enter the process, while the others
+ * will be spinning at the same lock.
+ */
+ lidx = smp_processor_id() % cmdqv->num_vcmdqs_per_vintf;
+ vcmdq = vintf->vcmdqs[lidx];
+ if (!FIELD_GET(VCMDQ_ENABLED, atomic_read(&vcmdq->status)))
+ return &smmu->cmdq;
+ return &vcmdq->cmdq;
+}
+
+static void tegra241_vcmdq_hw_deinit(struct tegra241_vcmdq *vcmdq)
+{
+ u32 gerrorn, gerror;
+
+ if (vcmdq_write_config(0)) {
+ vcmdq_err("GERRORN=0x%X\n", vcmdq_page0_readl_relaxed(GERRORN));
+ vcmdq_err("GERROR=0x%X\n", vcmdq_page0_readl_relaxed(GERROR));
+ vcmdq_err("CONS=0x%X\n", vcmdq_page0_readl_relaxed(CONS));
+ }
+ vcmdq_page0_writel_relaxed(0, PROD);
+ vcmdq_page0_writel_relaxed(0, CONS);
+ vcmdq_page1_writeq_relaxed(0, BASE);
+ vcmdq_page1_writeq_relaxed(0, CONS_INDX_BASE);
+
+ gerrorn = vcmdq_page0_readl_relaxed(GERRORN);
+ gerror = vcmdq_page0_readl_relaxed(GERROR);
+ if (gerror != gerrorn) {
+ vcmdq_info("Uncleared error detected, resetting\n");
+ vcmdq_page0_writel(gerror, GERRORN);
+ }
+
+ vcmdq_dbg("deinited\n");
+}
+
+static int tegra241_vcmdq_hw_init(struct tegra241_vcmdq *vcmdq)
+{
+ int ret;
+
+ /* Configure and enable the vcmdq */
+ tegra241_vcmdq_hw_deinit(vcmdq);
+
+ vcmdq_page1_writeq_relaxed(vcmdq->cmdq.q.q_base, BASE);
+
+ ret = vcmdq_write_config(VCMDQ_EN);
+ if (ret) {
+ vcmdq_err("GERRORN=0x%X\n", vcmdq_page0_readl_relaxed(GERRORN));
+ vcmdq_err("GERROR=0x%X\n", vcmdq_page0_readl_relaxed(GERROR));
+ vcmdq_err("CONS=0x%X\n", vcmdq_page0_readl_relaxed(CONS));
+ return ret;
+ }
+
+ vcmdq_dbg("inited\n");
+ return 0;
+}
+
+/* Adapt struct arm_smmu_cmdq init sequences from arm-smmu-v3.c for VCMDQs */
+static int tegra241_vcmdq_alloc_smmu_cmdq(struct tegra241_vcmdq *vcmdq)
+{
+ struct arm_smmu_device *smmu = vcmdq->cmdqv->smmu;
+ struct arm_smmu_cmdq *cmdq = &vcmdq->cmdq;
+ struct arm_smmu_queue *q = &cmdq->q;
+ char name[16];
+ int ret;
+
+ sprintf(name, "vcmdq%u", vcmdq->idx);
+
+ q->llq.max_n_shift = ilog2(SZ_64K >> CMDQ_ENT_SZ_SHIFT);
+
+ /* Use the common helper to init the VCMDQ, and then... */
+ ret = arm_smmu_init_one_queue(smmu, q, vcmdq->page0,
+ TEGRA241_VCMDQ_PROD, TEGRA241_VCMDQ_CONS,
+ CMDQ_ENT_DWORDS, name);
+ if (ret)
+ return ret;
+
+ /* ...override q_base to write VCMDQ_BASE registers */
+ q->q_base = q->base_dma & VCMDQ_ADDR;
+ q->q_base |= FIELD_PREP(VCMDQ_LOG2SIZE, q->llq.max_n_shift);
+
+ /* All VCMDQs support CS_NONE only for CMD_SYNC */
+ q->quirks = CMDQ_QUIRK_SYNC_CS_NONE_ONLY;
+
+ return arm_smmu_cmdq_init(smmu, cmdq);
+}
+
+static void tegra241_vcmdq_free_smmu_cmdq(struct tegra241_vcmdq *vcmdq)
+{
+ struct tegra241_cmdqv *cmdqv = vcmdq->cmdqv;
+ struct arm_smmu_queue *q = &vcmdq->cmdq.q;
+ size_t nents = 1 << q->llq.max_n_shift;
+
+ dmam_free_coherent(cmdqv->smmu->dev, (nents * CMDQ_ENT_DWORDS) << 3,
+ q->base, q->base_dma);
+}
+
+static int tegra241_vintf_lvcmdq_init(struct tegra241_vintf *vintf, u16 lidx,
+ struct tegra241_vcmdq *vcmdq)
+{
+ struct tegra241_cmdqv *cmdqv = vintf->cmdqv;
+ u16 idx = vintf->idx;
+ u16 qidx;
+
+ qidx = ida_alloc_max(&cmdqv->vcmdq_ids,
+ cmdqv->num_vcmdqs - 1, GFP_KERNEL);
+ if (qidx < 0)
+ return qidx;
+
+ vcmdq->idx = qidx;
+ vcmdq->lidx = lidx;
+ vcmdq->cmdqv = cmdqv;
+ vcmdq->vintf = vintf;
+ vcmdq->page0 = cmdqv->base + TEGRA241_VINTFi_LVCMDQ_PAGE0(idx, lidx);
+ vcmdq->page1 = cmdqv->base + TEGRA241_VINTFi_LVCMDQ_PAGE1(idx, lidx);
+ vcmdq->base = vcmdq->page0; /* CONFIG register is in page0 */
+ return 0;
+}
+
+static void tegra241_vintf_lvcmdq_deinit(struct tegra241_vcmdq *vcmdq)
+{
+ ida_free(&vcmdq->cmdqv->vcmdq_ids, vcmdq->idx);
+}
+
+static struct tegra241_vcmdq *
+tegra241_vintf_lvcmdq_alloc(struct tegra241_vintf *vintf, u16 lidx)
+{
+ struct tegra241_cmdqv *cmdqv = vintf->cmdqv;
+ struct tegra241_vcmdq *vcmdq;
+ int ret;
+
+ vcmdq = devm_kzalloc(cmdqv->dev, sizeof(*vcmdq), GFP_KERNEL);
+ if (!vcmdq)
+ return ERR_PTR(-ENOMEM);
+
+ ret = tegra241_vintf_lvcmdq_init(vintf, lidx, vcmdq);
+ if (ret)
+ goto free_vcmdq;
+
+ /* Setup struct arm_smmu_cmdq data members */
+ ret = tegra241_vcmdq_alloc_smmu_cmdq(vcmdq);
+ if (ret)
+ goto deinit_lvcmdq;
+
+ ret = tegra241_vcmdq_hw_init(vcmdq);
+ if (ret)
+ goto free_queue;
+
+ vcmdq_dbg("allocated\n");
+ return vcmdq;
+free_queue:
+ tegra241_vcmdq_free_smmu_cmdq(vcmdq);
+deinit_lvcmdq:
+ tegra241_vintf_lvcmdq_deinit(vcmdq);
+free_vcmdq:
+ devm_kfree(cmdqv->dev, vcmdq);
+ return ERR_PTR(ret);
+}
+
+static void tegra241_vintf_lvcmdq_free(struct tegra241_vcmdq *vcmdq)
+{
+ tegra241_vcmdq_hw_deinit(vcmdq);
+ tegra241_vcmdq_free_smmu_cmdq(vcmdq);
+ tegra241_vintf_lvcmdq_deinit(vcmdq);
+ devm_kfree(vcmdq->cmdqv->dev, vcmdq);
+}
+
+int tegra241_cmdqv_device_reset(struct arm_smmu_device *smmu)
+{
+ struct tegra241_cmdqv *cmdqv = smmu->tegra241_cmdqv;
+ struct tegra241_vintf *vintf = cmdqv->vintfs[0];
+ int qidx, lidx, idx, ret;
+ u32 regval;
+
+ /* Reset CMDQV */
+ regval = cmdqv_readl_relaxed(CONFIG);
+ ret = cmdqv_write_config(regval & ~CMDQV_EN);
+ if (ret)
+ return ret;
+ ret = cmdqv_write_config(regval | CMDQV_EN);
+ if (ret)
+ return ret;
+
+ /* Reset and configure vintf0 */
+ ret = vintf_write_config(0);
+ if (ret)
+ return ret;
+
+ /* Pre-allocate num_vcmdqs_per_vintf of VCMDQs to each VINTF */
+ for (idx = 0, qidx = 0; idx < cmdqv->num_vintfs; idx++) {
+ for (lidx = 0; lidx < cmdqv->num_vcmdqs_per_vintf; lidx++) {
+ regval = FIELD_PREP(CMDQV_CMDQ_ALLOC_VINTF, idx);
+ regval |= FIELD_PREP(CMDQV_CMDQ_ALLOC_LVCMDQ, lidx);
+ regval |= CMDQV_CMDQ_ALLOCATED;
+ cmdqv_writel_relaxed(regval, CMDQ_ALLOC(qidx++));
+ }
+ }
+
+ regval = FIELD_PREP(VINTF_HYP_OWN, 1);
+ vintf_writel(regval, CONFIG);
+
+ ret = vintf_write_config(regval | VINTF_EN);
+ if (ret)
+ return ret;
+
+ /* Build an arm_smmu_cmdq for each vcmdq allocated to vintf */
+ vintf->vcmdqs = devm_kcalloc(cmdqv->dev, cmdqv->num_vcmdqs_per_vintf,
+ sizeof(*vintf->vcmdqs), GFP_KERNEL);
+ if (!vintf->vcmdqs)
+ return -ENOMEM;
+
+ /* Allocate logical vcmdqs to vintf */
+ for (lidx = 0; lidx < cmdqv->num_vcmdqs_per_vintf; lidx++) {
+ struct tegra241_vcmdq *vcmdq;
+
+ vcmdq = tegra241_vintf_lvcmdq_alloc(vintf, lidx);
+ if (IS_ERR(vcmdq))
+ goto free_lvcmdq;
+ vintf->vcmdqs[lidx] = vcmdq;
+ }
+
+ return 0;
+free_lvcmdq:
+ for (lidx--; lidx >= 0; lidx--)
+ tegra241_vintf_lvcmdq_free(vintf->vcmdqs[lidx]);
+ devm_kfree(cmdqv->dev, vintf->vcmdqs);
+ return ret;
+}
+
+static int tegra241_cmdqv_acpi_is_memory(struct acpi_resource *res, void *data)
+{
+ struct resource_win win;
+
+ return !acpi_dev_resource_address_space(res, &win);
+}
+
+static int tegra241_cmdqv_acpi_get_irqs(struct acpi_resource *ares, void *data)
+{
+ struct resource r;
+ int *irq = data;
+
+ if (*irq <= 0 && acpi_dev_resource_interrupt(ares, 0, &r))
+ *irq = r.start;
+ return 1; /* No need to add resource to the list */
+}
+
+static struct tegra241_cmdqv *
+tegra241_cmdqv_find_resource(struct arm_smmu_device *smmu, int id)
+{
+ struct tegra241_cmdqv *cmdqv = NULL;
+ struct device *dev = smmu->dev;
+ struct list_head resource_list;
+ struct resource_entry *rentry;
+ struct acpi_device *adev;
+ const char *match_uid;
+ int ret;
+
+ if (acpi_disabled)
+ return NULL;
+
+ /* Look for a device in the DSDT whose _UID matches the SMMU node ID */
+ match_uid = kasprintf(GFP_KERNEL, "%u", id);
+ adev = acpi_dev_get_first_match_dev(TEGRA241_CMDQV_HID, match_uid, -1);
+ kfree(match_uid);
+
+ if (!adev)
+ return NULL;
+
+ dev_info(dev, "found companion CMDQV device, %s\n",
+ dev_name(&adev->dev));
+
+ INIT_LIST_HEAD(&resource_list);
+ ret = acpi_dev_get_resources(adev, &resource_list,
+ tegra241_cmdqv_acpi_is_memory, NULL);
+ if (ret < 0) {
+ dev_err(dev, "failed to get memory resource: %d\n", ret);
+ goto put_dev;
+ }
+
+ cmdqv = devm_kzalloc(dev, sizeof(*cmdqv), GFP_KERNEL);
+ if (!cmdqv)
+ goto free_list;
+
+ cmdqv->dev = dev;
+ cmdqv->smmu = smmu;
+
+ rentry = list_first_entry_or_null(&resource_list,
+ struct resource_entry, node);
+ if (!rentry) {
+ cmdqv_err("failed to get memory resource entry\n");
+ goto free_cmdqv;
+ }
+
+ cmdqv->base = devm_ioremap_resource(smmu->dev, rentry->res);
+ if (IS_ERR(cmdqv->base)) {
+ cmdqv_err("failed to ioremap: %ld\n", PTR_ERR(cmdqv->base));
+ goto free_cmdqv;
+ }
+
+ acpi_dev_free_resource_list(&resource_list);
+
+ INIT_LIST_HEAD(&resource_list);
+
+ ret = acpi_dev_get_resources(adev, &resource_list,
+ tegra241_cmdqv_acpi_get_irqs, &cmdqv->irq);
+ if (ret < 0 || cmdqv->irq <= 0) {
+ cmdqv_warn("no cmdqv interrupt. errors will not be reported\n");
+ } else {
+ ret = devm_request_irq(smmu->dev, cmdqv->irq,
+ tegra241_cmdqv_isr, 0,
+ "tegra241-cmdqv", cmdqv);
+ if (ret) {
+ cmdqv_err("failed to request irq (%d): %d\n",
+ cmdqv->irq, ret);
+ goto iounmap;
+ }
+ }
+
+ goto free_list;
+
+iounmap:
+ devm_iounmap(cmdqv->dev, cmdqv->base);
+free_cmdqv:
+ devm_kfree(cmdqv->dev, cmdqv);
+ cmdqv = NULL;
+free_list:
+ acpi_dev_free_resource_list(&resource_list);
+put_dev:
+ put_device(&adev->dev);
+
+ return cmdqv;
+}
+
+struct dentry *cmdqv_debugfs_dir;
+
+static int tegra241_cmdqv_probe(struct tegra241_cmdqv *cmdqv)
+{
+ struct tegra241_vintf *vintf;
+ u32 regval;
+ int ret;
+
+ regval = cmdqv_readl(CONFIG);
+ if (disable_cmdqv) {
+ cmdqv_info("disable_cmdqv=true. Falling back to SMMU CMDQ\n");
+ cmdqv_write_config(regval & ~CMDQV_EN);
+ return -ENODEV;
+ }
+
+ ret = cmdqv_write_config(regval | CMDQV_EN);
+ if (ret)
+ return ret;
+
+ regval = cmdqv_readl_relaxed(PARAM);
+ cmdqv->num_vintfs = 1 << FIELD_GET(CMDQV_NUM_VINTF_LOG2, regval);
+ cmdqv->num_vcmdqs = 1 << FIELD_GET(CMDQV_NUM_VCMDQ_LOG2, regval);
+ cmdqv->num_vcmdqs_per_vintf = cmdqv->num_vcmdqs / cmdqv->num_vintfs;
+
+ cmdqv->vintfs = devm_kcalloc(cmdqv->dev, cmdqv->num_vintfs,
+ sizeof(*cmdqv->vintfs), GFP_KERNEL);
+ if (!cmdqv->vintfs)
+ return -ENOMEM;
+
+ vintf = devm_kzalloc(cmdqv->dev, sizeof(*vintf), GFP_KERNEL);
+ if (!vintf) {
+ ret = -ENOMEM;
+ goto free_vintfs;
+ }
+
+ ida_init(&cmdqv->vintf_ids);
+ ida_init(&cmdqv->vcmdq_ids);
+
+ /* Reserve vintfs[0] for in-kernel use */
+ ret = ida_alloc_max(&cmdqv->vintf_ids, 0, GFP_KERNEL);
+ if (ret != 0) {
+ cmdqv_err("failed to reserve vintf0: ret %d\n", ret);
+ if (ret > 0)
+ ret = -EBUSY;
+ goto destroy_ids;
+ }
+ vintf->idx = 0;
+ cmdqv->vintfs[0] = vintf;
+
+ vintf->cmdqv = cmdqv;
+ vintf->base = cmdqv->base + TEGRA241_VINTF(0);
+
+#ifdef CONFIG_IOMMU_DEBUGFS
+ if (!cmdqv_debugfs_dir) {
+ cmdqv_debugfs_dir = debugfs_create_dir("tegra241_cmdqv", iommu_debugfs_dir);
+ debugfs_create_bool("bypass_vcmdq", 0644, cmdqv_debugfs_dir, &bypass_vcmdq);
+ }
+#endif
+
+ return 0;
+destroy_ids:
+ ida_destroy(&cmdqv->vcmdq_ids);
+ ida_destroy(&cmdqv->vintf_ids);
+ devm_kfree(cmdqv->dev, vintf);
+free_vintfs:
+ devm_kfree(cmdqv->dev, cmdqv->vintfs);
+ return ret;
+}
+
+struct tegra241_cmdqv *
+tegra241_cmdqv_acpi_probe(struct arm_smmu_device *smmu, int id)
+{
+ struct tegra241_cmdqv *cmdqv;
+
+ cmdqv = tegra241_cmdqv_find_resource(smmu, id);
+ if (!cmdqv)
+ return NULL;
+
+ if (tegra241_cmdqv_probe(cmdqv)) {
+ if (cmdqv->irq > 0)
+ devm_free_irq(smmu->dev, cmdqv->irq, cmdqv);
+ devm_iounmap(smmu->dev, cmdqv->base);
+ devm_kfree(smmu->dev, cmdqv);
+ return NULL;
+ }
+
+ return cmdqv;
+}
--
2.43.0


2024-04-30 04:45:56

by Nicolin Chen

[permalink] [raw]
Subject: [PATCH v6 6/6] iommu/tegra241-cmdqv: Limit CMDs for guest owned VINTF

When VCMDQs are assigned to a VINTF owned by a guest (HYP_OWN bit unset),
only TLB and ATC invalidation commands are supported by the VCMDQ HW. So,
add a new helper to scan the input cmds to make sure every single command
is supported when selecting a queue.

Note that the guest VM shouldn't have HYP_OWN bit being set regardless of
guest kernel driver writing it or not, i.e. the hypervisor running in the
host OS should wire this bit to zero when trapping a write access to this
VINTF_CONFIG register from a guest kernel.

Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 7 +--
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 ++-
.../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 44 ++++++++++++++++++-
3 files changed, 50 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 665a5e585f72..0802c3c96a2a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -352,10 +352,11 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
return 0;
}

-static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smmu)
+static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smmu,
+ u64 *cmds, int n)
{
if (smmu->tegra241_cmdqv)
- return tegra241_cmdqv_get_cmdq(smmu);
+ return tegra241_cmdqv_get_cmdq(smmu, cmds, n);

return &smmu->cmdq;
}
@@ -766,7 +767,7 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
u32 prod;
unsigned long flags;
bool owner;
- struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu);
+ struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu, cmds, n);
struct arm_smmu_ll_queue llq, head;
int ret = 0;

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 87e4c227a937..e21e29f4770b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -837,7 +837,8 @@ static inline void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain,
struct tegra241_cmdqv *
tegra241_cmdqv_acpi_probe(struct arm_smmu_device *smmu, int id);
int tegra241_cmdqv_device_reset(struct arm_smmu_device *smmu);
-struct arm_smmu_cmdq *tegra241_cmdqv_get_cmdq(struct arm_smmu_device *smmu);
+struct arm_smmu_cmdq *tegra241_cmdqv_get_cmdq(struct arm_smmu_device *smmu,
+ u64 *cmds, int n);
#else /* CONFIG_TEGRA241_CMDQV */
static inline struct tegra241_cmdqv *
tegra241_cmdqv_acpi_probe(struct arm_smmu_device *smmu, int id)
@@ -851,7 +852,7 @@ static inline int tegra241_cmdqv_device_reset(struct arm_smmu_device *smmu)
}

static inline struct arm_smmu_cmdq *
-tegra241_cmdqv_get_cmdq(struct arm_smmu_device *smmu)
+tegra241_cmdqv_get_cmdq(struct arm_smmu_device *smmu, u64 *cmds, int n)
{
return NULL;
}
diff --git a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
index 4b2af3aaa6b4..59ff2b740bec 100644
--- a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
+++ b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
@@ -266,6 +266,7 @@ struct tegra241_vcmdq {
* struct tegra241_vintf - Virtual Interface
* @idx: Global index in the CMDQV HW
* @status: cached status register
+ * @hyp_own: Owned by hypervisor (in-kernel)
* @cmdqv: CMDQV HW pointer
* @vcmdqs: List of VCMDQ pointers
* @base: MMIO base address
@@ -274,6 +275,7 @@ struct tegra241_vintf {
u16 idx;

atomic_t status;
+ bool hyp_own;

struct tegra241_cmdqv *cmdqv;
struct tegra241_vcmdq **vcmdqs;
@@ -372,7 +374,32 @@ static irqreturn_t tegra241_cmdqv_isr(int irq, void *devid)
return IRQ_HANDLED;
}

-struct arm_smmu_cmdq *tegra241_cmdqv_get_cmdq(struct arm_smmu_device *smmu)
+static bool tegra241_vintf_support_cmds(struct tegra241_vintf *vintf,
+ u64 *cmds, int n)
+{
+ int i;
+
+ /* VINTF owned by hypervisor can execute any command */
+ if (vintf->hyp_own)
+ return true;
+
+ /* Guest-owned VINTF must Check against the list of supported CMDs */
+ for (i = 0; i < n; i++) {
+ switch (FIELD_GET(CMDQ_0_OP, cmds[i * CMDQ_ENT_DWORDS])) {
+ case CMDQ_OP_TLBI_NH_ASID:
+ case CMDQ_OP_TLBI_NH_VA:
+ case CMDQ_OP_ATC_INV:
+ continue;
+ default:
+ return false;
+ }
+ }
+
+ return true;
+}
+
+struct arm_smmu_cmdq *tegra241_cmdqv_get_cmdq(struct arm_smmu_device *smmu,
+ u64 *cmds, int n)
{
struct tegra241_cmdqv *cmdqv = smmu->tegra241_cmdqv;
struct tegra241_vintf *vintf = cmdqv->vintfs[0];
@@ -390,6 +417,10 @@ struct arm_smmu_cmdq *tegra241_cmdqv_get_cmdq(struct arm_smmu_device *smmu)
if (FIELD_GET(VINTF_STATUS, atomic_read(&vintf->status)))
return &smmu->cmdq;

+ /* Unsupported CMDs go for smmu->cmdq pathway */
+ if (!tegra241_vintf_support_cmds(vintf, cmds, n))
+ return &smmu->cmdq;
+
/*
* Select a vcmdq to use. Here we use a temporal solution to
* balance out traffic on cmdq issuing: each cmdq has its own
@@ -590,6 +621,11 @@ int tegra241_cmdqv_device_reset(struct arm_smmu_device *smmu)
}
}

+ /*
+ * Note that HYP_OWN bit is wired to zero when running in guest kernel
+ * regardless of enabling it here, as !HYP_OWN cmdqs have a restricted
+ * set of supported commands, by following the HW design.
+ */
regval = FIELD_PREP(VINTF_HYP_OWN, 1);
vintf_writel(regval, CONFIG);

@@ -597,6 +633,12 @@ int tegra241_cmdqv_device_reset(struct arm_smmu_device *smmu)
if (ret)
return ret;

+ /*
+ * As being mentioned above, HYP_OWN bit is wired to zero for a guest
+ * kernel, so read it back from HW to ensure that reflects in hyp_own
+ */
+ vintf->hyp_own = !!(VINTF_HYP_OWN & vintf_readl(CONFIG));
+
/* Build an arm_smmu_cmdq for each vcmdq allocated to vintf */
vintf->vcmdqs = devm_kcalloc(cmdqv->dev, cmdqv->num_vcmdqs_per_vintf,
sizeof(*vintf->vcmdqs), GFP_KERNEL);
--
2.43.0


2024-04-30 14:19:09

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 4/6] iommu/arm-smmu-v3: Make __arm_smmu_cmdq_skip_err reusable

On Mon, Apr 29, 2024 at 09:43:47PM -0700, Nicolin Chen wrote:
> Allow __arm_smmu_cmdq_skip_err function to be reused by NVIDIA Tegra241
> CMDQV unit since it will use the same data structure for q. And include
> the CMDQ_QUIRK_SYNC_CS_NONE_ONLY quirk when inserting a CMD_SYNC.
>
> Signed-off-by: Nicolin Chen <[email protected]>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 7 +++++--
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 ++
> 2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 538850059bdd..5111859347d5 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -379,8 +379,8 @@ static void arm_smmu_cmdq_build_sync_cmd(u64 *cmd, struct arm_smmu_device *smmu,
> arm_smmu_cmdq_build_cmd(cmd, &ent);
> }
>
> -static void __arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu,
> - struct arm_smmu_queue *q)
> +void __arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu,
> + struct arm_smmu_queue *q)
> {
> static const char * const cerror_str[] = {
> [CMDQ_ERR_CERROR_NONE_IDX] = "No error",
> @@ -428,6 +428,9 @@ static void __arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu,
> for (i = 0; i < ARRAY_SIZE(cmd); ++i)
> dev_err(smmu->dev, "\t0x%016llx\n", (unsigned long long)cmd[i]);
>
> + if (q->quirks & CMDQ_QUIRK_SYNC_CS_NONE_ONLY)
> + cmd_sync.sync.cs_none = true;

This hunk should be in "iommu/arm-smmu-v3: Add CS_NONE quirk" ?

Jason

2024-04-30 14:30:02

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 3/6] iommu/arm-smmu-v3: Make arm_smmu_cmdq_init reusable

On Mon, Apr 29, 2024 at 09:43:46PM -0700, Nicolin Chen wrote:
> The CMDQV extension in NVIDIA Tegra241 SoC resues the arm_smmu_cmdq
> structure while the queue location isn't same as smmu->cmdq.
>
> Add a cmdq argument to arm_smmu_cmdq_init() function and shares its
> define in the header for CMDQV driver to use.
>
> Signed-off-by: Nicolin Chen <[email protected]>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 6 +++---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 +++
> 2 files changed, 6 insertions(+), 3 deletions(-)

I would squash this patch and the next together:

iommu/arm-smm-v3: Make symbols public

The symbols arm_smmu_cmdq_init() and __arm_smmu_cmdq_skip_err() need
to be used by the tegra241-cmdqv.c compilation unit in the next
patch. Remove the static and put prototypes in the header.

But the code is fine

Reviewed-by: Jason Gunthorpe <[email protected]>

Jason

2024-04-30 15:49:14

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v6 4/6] iommu/arm-smmu-v3: Make __arm_smmu_cmdq_skip_err reusable

On Tue, Apr 30, 2024 at 11:06:00AM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 29, 2024 at 09:43:47PM -0700, Nicolin Chen wrote:
> > Allow __arm_smmu_cmdq_skip_err function to be reused by NVIDIA Tegra241
> > CMDQV unit since it will use the same data structure for q. And include
> > the CMDQ_QUIRK_SYNC_CS_NONE_ONLY quirk when inserting a CMD_SYNC.
> >
> > Signed-off-by: Nicolin Chen <[email protected]>
> > ---
> > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 7 +++++--
> > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 ++
> > 2 files changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index 538850059bdd..5111859347d5 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -379,8 +379,8 @@ static void arm_smmu_cmdq_build_sync_cmd(u64 *cmd, struct arm_smmu_device *smmu,
> > arm_smmu_cmdq_build_cmd(cmd, &ent);
> > }
> >
> > -static void __arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu,
> > - struct arm_smmu_queue *q)
> > +void __arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu,
> > + struct arm_smmu_queue *q)
> > {
> > static const char * const cerror_str[] = {
> > [CMDQ_ERR_CERROR_NONE_IDX] = "No error",
> > @@ -428,6 +428,9 @@ static void __arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu,
> > for (i = 0; i < ARRAY_SIZE(cmd); ++i)
> > dev_err(smmu->dev, "\t0x%016llx\n", (unsigned long long)cmd[i]);
> >
> > + if (q->quirks & CMDQ_QUIRK_SYNC_CS_NONE_ONLY)
> > + cmd_sync.sync.cs_none = true;
>
> This hunk should be in "iommu/arm-smmu-v3: Add CS_NONE quirk" ?

Oh, yea. Will move it.

Thanks!
Nicolin

2024-04-30 15:55:03

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v6 3/6] iommu/arm-smmu-v3: Make arm_smmu_cmdq_init reusable

On Tue, Apr 30, 2024 at 11:24:08AM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 29, 2024 at 09:43:46PM -0700, Nicolin Chen wrote:
> > The CMDQV extension in NVIDIA Tegra241 SoC resues the arm_smmu_cmdq
> > structure while the queue location isn't same as smmu->cmdq.
> >
> > Add a cmdq argument to arm_smmu_cmdq_init() function and shares its
> > define in the header for CMDQV driver to use.
> >
> > Signed-off-by: Nicolin Chen <[email protected]>
> > ---
> > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 6 +++---
> > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 +++
> > 2 files changed, 6 insertions(+), 3 deletions(-)
>
> I would squash this patch and the next together:
>
> iommu/arm-smm-v3: Make symbols public
>
> The symbols arm_smmu_cmdq_init() and __arm_smmu_cmdq_skip_err() need
> to be used by the tegra241-cmdqv.c compilation unit in the next
> patch. Remove the static and put prototypes in the header.
>
> But the code is fine
>
> Reviewed-by: Jason Gunthorpe <[email protected]>

Then, arm_smmu_init_one_queue could be moved to this patch too.

Thanks
Nicolin

2024-04-30 16:54:21

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 5/6] iommu/arm-smmu-v3: Add in-kernel support for NVIDIA Tegra241 (Grace) CMDQV

On Mon, Apr 29, 2024 at 09:43:48PM -0700, Nicolin Chen wrote:

> static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smmu)
> {
> + if (smmu->tegra241_cmdqv)
> + return tegra241_cmdqv_get_cmdq(smmu);

Since it is compile time optional it would make some sense to optimize
this (in all the places) too:

if (arm_smmu_has_smmu_tegra241_cmdqv(smmu))
[..]

static inline bool arm_smmu_has_smmu_tegra241_cmdqv(struct arm_smmu_device *smmu)
{
return IS_ENABLED(CONFIG_TEGRA241_CMDQV) && smmu->tegra241_cmdqv;
}

> @@ -3105,12 +3108,10 @@ static struct iommu_ops arm_smmu_ops = {
> };
>
> /* Probing and initialisation functions */
> -static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
> - struct arm_smmu_queue *q,
> - void __iomem *page,
> - unsigned long prod_off,
> - unsigned long cons_off,
> - size_t dwords, const char *name)
> +int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
> + struct arm_smmu_queue *q, void __iomem *page,
> + unsigned long prod_off, unsigned long cons_off,
> + size_t dwords, const char *name)
> {
> size_t qsz;

This hunk and the .h file part should be moved to the prior patch that
is de-exporting things.

> +/* MMIO helpers */
> +#define cmdqv_readl(reg) \
> + readl(cmdqv->base + TEGRA241_CMDQV_##reg)
> +#define cmdqv_readl_relaxed(reg) \
> + readl_relaxed(cmdqv->base + TEGRA241_CMDQV_##reg)
> +#define cmdqv_writel(val, reg) \
> + writel((val), cmdqv->base + TEGRA241_CMDQV_##reg)
> +#define cmdqv_writel_relaxed(val, reg) \
> + writel_relaxed((val), cmdqv->base + TEGRA241_CMDQV_##reg)

Please don't hide access to a stack variable in a macro, and I'm not
keen on the ##reg scheme either - it makes it much harder to search
for things.

Really this all seems like alot of overkill to make a little bit of
shorthand. It is not so wordy just to type it out:

readl(vintf->base + TEGRA241_VINTF_CONFIG)

> +/* Logging helpers */
> +#define cmdqv_warn(fmt, ...) \
> + dev_warn(cmdqv->dev, "CMDQV: " fmt, ##__VA_ARGS__)
> +#define cmdqv_err(fmt, ...) \
> + dev_err(cmdqv->dev, "CMDQV: " fmt, ##__VA_ARGS__)
> +#define cmdqv_info(fmt, ...) \
> + dev_info(cmdqv->dev, "CMDQV: " fmt, ##__VA_ARGS__)
> +#define cmdqv_dbg(fmt, ...) \
> + dev_dbg(cmdqv->dev, "CMDQV: " fmt, ##__VA_ARGS__)

Really not sure these are necessary, same remark about the stack
variable.

Also cmdqv->dev is the wrong thing to print, this is part of the smmu driver,
it should print cmdqv->smmu->dev for consistency

> +#define vintf_warn(fmt, ...) \
> + dev_warn(vintf->cmdqv->dev, "VINTF%u: " fmt, vintf->idx, ##__VA_ARGS__)
> +#define vintf_err(fmt, ...) \
> + dev_err(vintf->cmdqv->dev, "VINTF%u: " fmt, vintf->idx, ##__VA_ARGS__)
> +#define vintf_info(fmt, ...) \
> + dev_info(vintf->cmdqv->dev, "VINTF%u: " fmt, vintf->idx, ##__VA_ARGS__)
> +#define vintf_dbg(fmt, ...) \
> + dev_dbg(vintf->cmdqv->dev, "VINTF%u: " fmt, vintf->idx, ##__VA_ARGS__)
> +
> +#define vcmdq_warn(fmt, ...) \
> + ({ \
> + struct tegra241_vintf *vintf = vcmdq->vintf; \
> + if (vintf) \
> + vintf_warn("VCMDQ%u/LVCMDQ%u: " fmt, \
> + vcmdq->idx, vcmdq->lidx, \
> + ##__VA_ARGS__); \
> + else \
> + dev_warn(vcmdq->cmdqv->dev, "VCMDQ%u: " fmt, \
> + vcmdq->idx, ##__VA_ARGS__); \
> + })
> +#define vcmdq_err(fmt, ...) \
> + ({ \
> + struct tegra241_vintf *vintf = vcmdq->vintf; \
> + if (vintf) \
> + vintf_err("VCMDQ%u/LVCMDQ%u: " fmt, \
> + vcmdq->idx, vcmdq->lidx, \
> + ##__VA_ARGS__); \
> + else \
> + dev_err(vcmdq->cmdqv->dev, "VCMDQ%u: " fmt, \
> + vcmdq->idx, ##__VA_ARGS__); \
> + })
> +#define vcmdq_info(fmt, ...) \
> + ({ \
> + struct tegra241_vintf *vintf = vcmdq->vintf; \
> + if (vintf) \
> + vintf_info("VCMDQ%u/LVCMDQ%u: " fmt, \
> + vcmdq->idx, vcmdq->lidx, \
> + ##__VA_ARGS__); \
> + else \
> + dev_info(vcmdq->cmdqv->dev, "VCMDQ%u: " fmt, \
> + vcmdq->idx, ##__VA_ARGS__); \
> + })
> +#define vcmdq_dbg(fmt, ...) \
> + ({ \
> + struct tegra241_vintf *vintf = vcmdq->vintf; \
> + if (vintf) \
> + vintf_dbg("VCMDQ%u/LVCMDQ%u: " fmt, \
> + vcmdq->idx, vcmdq->lidx, \
> + ##__VA_ARGS__); \
> + else \
> + dev_dbg(vcmdq->cmdqv->dev, "VCMDQ%u: " fmt, \
> + vcmdq->idx, ##__VA_ARGS__); \
> + })

Some of these are barely used, is it worth all these macros??

> +
> +/* Configuring and polling helpers */
> +#define tegra241_cmdqv_write_config(_owner, _OWNER, _regval) \
> + ({ \
> + bool _en = (_regval) & _OWNER##_EN; \
> + u32 _status; \
> + int _ret; \
> + writel((_regval), _owner->base + TEGRA241_##_OWNER##_CONFIG); \
> + _ret = readl_poll_timeout( \
> + _owner->base + TEGRA241_##_OWNER##_STATUS, _status, \
> + _en ? (_regval) & _OWNER##_ENABLED : \
> + !((_regval) & _OWNER##_ENABLED), \
> + 1, ARM_SMMU_POLL_TIMEOUT_US); \
> + if (_ret) \
> + _owner##_err("failed to %sable, STATUS = 0x%08X\n", \
> + _en ? "en" : "dis", _status); \
> + atomic_set(&_owner->status, _status); \
> + _ret; \
> + })

I feel like this could be an actual inline function without the macro
wrapper with a little fiddling.

> +
> +#define cmdqv_write_config(_regval) \
> + tegra241_cmdqv_write_config(cmdqv, CMDQV, _regval)
> +#define vintf_write_config(_regval) \
> + tegra241_cmdqv_write_config(vintf, VINTF, _regval)
> +#define vcmdq_write_config(_regval) \
> + tegra241_cmdqv_write_config(vcmdq, VCMDQ, _regval)

More hidden access to stack values

> +/**
> + * struct tegra241_cmdqv - CMDQ-V for SMMUv3
> + * @smmu: SMMUv3 pointer
> + * @dev: Device pointer

This should probably be clarified as the device pointer to the ACPI
companion device

> +static void tegra241_cmdqv_handle_vintf0_error(struct tegra241_cmdqv *cmdqv)
> +{
> + struct tegra241_vintf *vintf = cmdqv->vintfs[0];
> + int i;
> +
> + /* Cache status to bypass VCMDQs until error is recovered */
> + atomic_set(&vintf->status, vintf_readl(STATUS));
> +
> + for (i = 0; i < 4; i++) {
> + u32 lvcmdq_err_map = vintf_readl_relaxed(CMDQ_ERR_MAP(i));
> +
> + while (lvcmdq_err_map) {
> + int lidx = ffs(lvcmdq_err_map) - 1;
> + struct tegra241_vcmdq *vcmdq = vintf->vcmdqs[lidx];
> + u32 gerrorn, gerror;
> +
> + lvcmdq_err_map &= ~BIT(lidx);
> +
> + __arm_smmu_cmdq_skip_err(cmdqv->smmu, &vcmdq->cmdq.q);
> +
> + gerrorn = vcmdq_page0_readl_relaxed(GERRORN);
> + gerror = vcmdq_page0_readl_relaxed(GERROR);
> +
> + vcmdq_page0_writel(gerror, GERRORN);
> + }
> + }
> +
> + /* Now error status should be clean, cache it again */
> + atomic_set(&vintf->status, vintf_readl(STATUS));
> +}
> +
> +static irqreturn_t tegra241_cmdqv_isr(int irq, void *devid)
> +{
> + struct tegra241_cmdqv *cmdqv = (struct tegra241_cmdqv *)devid;
> + u32 vintf_errs[2];
> + u32 vcmdq_errs[4];
> +
> + vintf_errs[0] = cmdqv_readl_relaxed(VINTF_ERR_MAP);
> + vintf_errs[1] = cmdqv_readl_relaxed(VINTF_ERR_MAP + 0x4);
> +
> + vcmdq_errs[0] = cmdqv_readl_relaxed(VCMDQ_ERR_MAP(0));
> + vcmdq_errs[1] = cmdqv_readl_relaxed(VCMDQ_ERR_MAP(1));
> + vcmdq_errs[2] = cmdqv_readl_relaxed(VCMDQ_ERR_MAP(2));
> + vcmdq_errs[3] = cmdqv_readl_relaxed(VCMDQ_ERR_MAP(3));
> +
> + cmdqv_warn("unexpected cmdqv error reported\n");
> + cmdqv_warn(" vintf_map: 0x%08X%08X\n", vintf_errs[1], vintf_errs[0]);
> + cmdqv_warn(" vcmdq_map: 0x%08X%08X%08X%08X\n",
> + vcmdq_errs[3], vcmdq_errs[2], vcmdq_errs[1], vcmdq_errs[0]);

Put warnings in one print only, spreading them like this just
increases the risk of tearing.. It doesn't need to be all pretty.

> +struct arm_smmu_cmdq *tegra241_cmdqv_get_cmdq(struct arm_smmu_device *smmu)
> +{
> + struct tegra241_cmdqv *cmdqv = smmu->tegra241_cmdqv;
> + struct tegra241_vintf *vintf = cmdqv->vintfs[0];
> + struct tegra241_vcmdq *vcmdq;
> + u16 lidx;
> +
> + if (bypass_vcmdq)

READ_ONCE

> + return &smmu->cmdq;
> +
> + /* Use SMMU CMDQ if vintfs[0] is uninitialized */
> + if (!FIELD_GET(VINTF_ENABLED, atomic_read(&vintf->status)))
> + return &smmu->cmdq;
> +
> + /* Use SMMU CMDQ if vintfs[0] has error status */
> + if (FIELD_GET(VINTF_STATUS, atomic_read(&vintf->status)))
> + return &smmu->cmdq;

Why atomic_read? The unlocked interaction with
tegra241_cmdqv_handle_vintf0_error() doesn't seem especially sane IMHO

> +static void tegra241_vcmdq_hw_deinit(struct tegra241_vcmdq *vcmdq)
> +{
> + u32 gerrorn, gerror;
> +
> + if (vcmdq_write_config(0)) {
> + vcmdq_err("GERRORN=0x%X\n", vcmdq_page0_readl_relaxed(GERRORN));
> + vcmdq_err("GERROR=0x%X\n", vcmdq_page0_readl_relaxed(GERROR));
> + vcmdq_err("CONS=0x%X\n", vcmdq_page0_readl_relaxed(CONS));

Less prints, include a unique message about why this is being
printed..

> + }
> + vcmdq_page0_writel_relaxed(0, PROD);
> + vcmdq_page0_writel_relaxed(0, CONS);
> + vcmdq_page1_writeq_relaxed(0, BASE);
> + vcmdq_page1_writeq_relaxed(0, CONS_INDX_BASE);
> +
> + gerrorn = vcmdq_page0_readl_relaxed(GERRORN);
> + gerror = vcmdq_page0_readl_relaxed(GERROR);
> + if (gerror != gerrorn) {
> + vcmdq_info("Uncleared error detected, resetting\n");
> + vcmdq_page0_writel(gerror, GERRORN);
> + }
> +
> + vcmdq_dbg("deinited\n");
> +}
> +
> +static int tegra241_vcmdq_hw_init(struct tegra241_vcmdq *vcmdq)
> +{
> + int ret;
> +
> + /* Configure and enable the vcmdq */
> + tegra241_vcmdq_hw_deinit(vcmdq);
> +
> + vcmdq_page1_writeq_relaxed(vcmdq->cmdq.q.q_base, BASE);
> +
> + ret = vcmdq_write_config(VCMDQ_EN);
> + if (ret) {
> + vcmdq_err("GERRORN=0x%X\n", vcmdq_page0_readl_relaxed(GERRORN));
> + vcmdq_err("GERROR=0x%X\n", vcmdq_page0_readl_relaxed(GERROR));
> + vcmdq_err("CONS=0x%X\n", vcmdq_page0_readl_relaxed(CONS));
> + return ret;

Same print?

> +static void tegra241_vcmdq_free_smmu_cmdq(struct tegra241_vcmdq *vcmdq)
> +{
> + struct tegra241_cmdqv *cmdqv = vcmdq->cmdqv;
> + struct arm_smmu_queue *q = &vcmdq->cmdq.q;
> + size_t nents = 1 << q->llq.max_n_shift;
> +
> + dmam_free_coherent(cmdqv->smmu->dev, (nents * CMDQ_ENT_DWORDS) << 3,
> + q->base, q->base_dma);

If we are calling dmam_free, do we really need devm at all?

> +static struct tegra241_vcmdq *
> +tegra241_vintf_lvcmdq_alloc(struct tegra241_vintf *vintf, u16 lidx)
> +{
> + struct tegra241_cmdqv *cmdqv = vintf->cmdqv;
> + struct tegra241_vcmdq *vcmdq;
> + int ret;
> +
> + vcmdq = devm_kzalloc(cmdqv->dev, sizeof(*vcmdq), GFP_KERNEL);
> + if (!vcmdq)
> + return ERR_PTR(-ENOMEM);
> +
> + ret = tegra241_vintf_lvcmdq_init(vintf, lidx, vcmdq);
> + if (ret)
> + goto free_vcmdq;
> +
> + /* Setup struct arm_smmu_cmdq data members */
> + ret = tegra241_vcmdq_alloc_smmu_cmdq(vcmdq);
> + if (ret)
> + goto deinit_lvcmdq;
> +
> + ret = tegra241_vcmdq_hw_init(vcmdq);
> + if (ret)
> + goto free_queue;
> +
> + vcmdq_dbg("allocated\n");
> + return vcmdq;
> +free_queue:
> + tegra241_vcmdq_free_smmu_cmdq(vcmdq);
> +deinit_lvcmdq:
> + tegra241_vintf_lvcmdq_deinit(vcmdq);
> +free_vcmdq:
> + devm_kfree(cmdqv->dev, vcmdq);
> + return ERR_PTR(ret);
> +}
> +
> +static void tegra241_vintf_lvcmdq_free(struct tegra241_vcmdq *vcmdq)
> +{
> + tegra241_vcmdq_hw_deinit(vcmdq);
> + tegra241_vcmdq_free_smmu_cmdq(vcmdq);
> + tegra241_vintf_lvcmdq_deinit(vcmdq);
> + devm_kfree(vcmdq->cmdqv->dev, vcmdq);

Ditto for devm_kfree.

> +struct tegra241_cmdqv *
> +tegra241_cmdqv_acpi_probe(struct arm_smmu_device *smmu, int id)

id is a u32.

It might be clearer to just pass in the struct
acpi_iort_node *?

> +{
> + struct tegra241_cmdqv *cmdqv;
> +
> + cmdqv = tegra241_cmdqv_find_resource(smmu, id);
> + if (!cmdqv)
> + return NULL;
> +
> + if (tegra241_cmdqv_probe(cmdqv)) {
> + if (cmdqv->irq > 0)
> + devm_free_irq(smmu->dev, cmdqv->irq, cmdqv);
> + devm_iounmap(smmu->dev, cmdqv->base);
> + devm_kfree(smmu->dev, cmdqv);
> + return NULL;

Oh. Please don't use devm at all in this code then, it is not attached
to a probed driver with the proper scope, devm isn't going to work in
sensible way.

Jason

2024-04-30 17:07:45

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 6/6] iommu/tegra241-cmdqv: Limit CMDs for guest owned VINTF

On Mon, Apr 29, 2024 at 09:43:49PM -0700, Nicolin Chen wrote:
> -struct arm_smmu_cmdq *tegra241_cmdqv_get_cmdq(struct arm_smmu_device *smmu)
> +static bool tegra241_vintf_support_cmds(struct tegra241_vintf *vintf,
> + u64 *cmds, int n)
> +{
> + int i;
> +
> + /* VINTF owned by hypervisor can execute any command */
> + if (vintf->hyp_own)
> + return true;
> +
> + /* Guest-owned VINTF must Check against the list of supported CMDs */
> + for (i = 0; i < n; i++) {
> + switch (FIELD_GET(CMDQ_0_OP, cmds[i * CMDQ_ENT_DWORDS])) {
> + case CMDQ_OP_TLBI_NH_ASID:
> + case CMDQ_OP_TLBI_NH_VA:
> + case CMDQ_OP_ATC_INV:

So CMDQ only works if not ARM_SMMU_FEAT_E2H? Probably worth mentioning
that too along with the discussion about HYP


> + continue;
> + default:
> + return false;
> + }
> + }
> +
> + return true;
> +}

For a performance path this looping seems disappointing.. The callers
don't actually mix different command type. Is there something
preventing adding a parameter at the callers?

Actually looking at this more closely, isn't the command q selection
in the wrong place?

Ie this batch stuff:

static void arm_smmu_cmdq_batch_add(struct arm_smmu_device *smmu,
struct arm_smmu_cmdq_batch *cmds,
struct arm_smmu_cmdq_ent *cmd)
{
int index;

if (cmds->num == CMDQ_BATCH_ENTRIES - 1 &&
(smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC)) {
arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, true);
cmds->num = 0;
}

if (cmds->num == CMDQ_BATCH_ENTRIES) {
arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, false);
cmds->num = 0;
}

index = cmds->num * CMDQ_ENT_DWORDS;
if (unlikely(arm_smmu_cmdq_build_cmd(&cmds->cmds[index], cmd))) {
dev_warn(smmu->dev, "ignoring unknown CMDQ opcode 0x%x\n",
cmd->opcode);
return;
}

Has to push everything, across all the iterations of add/submut, onto
the same CMDQ otherwise the SYNC won't be properly flushing?

But each arm_smmu_cmdq_issue_cmdlist() calls its own get q
function. Yes, they probably return the same Q since we are probably
on the same CPU, but it seems logically wrong (and slower!) to
organize it like this.

I would expect the Q to be selected when the struct
arm_smmu_cmdq_batch is allocated on the stack, and be the same for the
entire batch operation. Not only do we spend less time trying to
compute the Q to use we have a built in guarentee that every command
will be on the same Q as the fenching SYNC.

Something sort of like this as another patch?

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 268da20baa4e9c..d8c9597878315a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -357,11 +357,22 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
return 0;
}

-static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smmu,
- u64 *cmds, int n)
+enum required_cmds {
+ CMDS_ALL,
+ /*
+ * Commands will be one of:
+ * CMDQ_OP_ATC_INV, CMDQ_OP_TLBI_EL2_VA, CMDQ_OP_TLBI_NH_VA,
+ * CMDQ_OP_TLBI_EL2_ASID, CMDQ_OP_TLBI_NH_ASID, CMDQ_OP_TLBI_S2_IPA,
+ * CMDQ_OP_TLBI_S12_VMALL, CMDQ_OP_SYNC
+ */
+ CMDS_INVALIDATION,
+};
+
+static struct arm_smmu_cmdq *
+arm_smmu_get_cmdq(struct arm_smmu_device *smmu, enum required_cmds required)
{
if (smmu->tegra241_cmdqv)
- return tegra241_cmdqv_get_cmdq(smmu, cmds, n);
+ return tegra241_cmdqv_get_cmdq(smmu, required);

return &smmu->cmdq;
}
@@ -766,13 +777,13 @@ static void arm_smmu_cmdq_write_entries(struct arm_smmu_cmdq *cmdq, u64 *cmds,
* CPU will appear before any of the commands from the other CPU.
*/
static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
- u64 *cmds, int n, bool sync)
+ struct arm_smmu_cmdq *cmdq, u64 *cmds,
+ int n, bool sync)
{
u64 cmd_sync[CMDQ_ENT_DWORDS];
u32 prod;
unsigned long flags;
bool owner;
- struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu, cmds, n);
struct arm_smmu_ll_queue llq, head;
int ret = 0;

@@ -897,7 +908,8 @@ static int __arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
return -EINVAL;
}

- return arm_smmu_cmdq_issue_cmdlist(smmu, cmd, 1, sync);
+ return arm_smmu_cmdq_issue_cmdlist(
+ smmu, arm_smmu_get_cmdq(smmu, CMDS_ALL), cmd, 1, sync);
}

static int arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
@@ -912,6 +924,14 @@ static int arm_smmu_cmdq_issue_cmd_with_sync(struct arm_smmu_device *smmu,
return __arm_smmu_cmdq_issue_cmd(smmu, ent, true);
}

+static void arm_smmu_cmdq_batch_init(struct arm_smmu_device *smmu,
+ struct arm_smmu_cmdq_batch *cmds,
+ enum required_cmds required)
+{
+ cmds->num = 0;
+ cmds->q = arm_smmu_get_cmdq(smmu, required);
+}
+
static void arm_smmu_cmdq_batch_add(struct arm_smmu_device *smmu,
struct arm_smmu_cmdq_batch *cmds,
struct arm_smmu_cmdq_ent *cmd)
@@ -920,12 +940,14 @@ static void arm_smmu_cmdq_batch_add(struct arm_smmu_device *smmu,

if (cmds->num == CMDQ_BATCH_ENTRIES - 1 &&
(smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC)) {
- arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, true);
+ arm_smmu_cmdq_issue_cmdlist(smmu, cmds->q, cmds->cmds,
+ cmds->num, true);
cmds->num = 0;
}

if (cmds->num == CMDQ_BATCH_ENTRIES) {
- arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, false);
+ arm_smmu_cmdq_issue_cmdlist(smmu, cmds->q, cmds->cmds,
+ cmds->num, false);
cmds->num = 0;
}

@@ -942,7 +964,8 @@ static void arm_smmu_cmdq_batch_add(struct arm_smmu_device *smmu,
static int arm_smmu_cmdq_batch_submit(struct arm_smmu_device *smmu,
struct arm_smmu_cmdq_batch *cmds)
{
- return arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, true);
+ return arm_smmu_cmdq_issue_cmdlist(smmu, cmds->q, cmds->cmds, cmds->num,
+ true);
}

static void arm_smmu_page_response(struct device *dev, struct iopf_fault *unused,
@@ -1181,7 +1204,7 @@ static void arm_smmu_sync_cd(struct arm_smmu_master *master,
},
};

- cmds.num = 0;
+ arm_smmu_cmdq_batch_init(smmu, &cmds, CMDS_ALL);
for (i = 0; i < master->num_streams; i++) {
cmd.cfgi.sid = master->streams[i].id;
arm_smmu_cmdq_batch_add(smmu, &cmds, &cmd);
@@ -2045,7 +2068,7 @@ static int arm_smmu_atc_inv_master(struct arm_smmu_master *master,

arm_smmu_atc_inv_to_cmd(ssid, 0, 0, &cmd);

- cmds.num = 0;
+ arm_smmu_cmdq_batch_init(master->smmu, &cmds, CMDS_INVALIDATION);
for (i = 0; i < master->num_streams; i++) {
cmd.atc.sid = master->streams[i].id;
arm_smmu_cmdq_batch_add(master->smmu, &cmds, &cmd);
@@ -2083,7 +2106,7 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain,
if (!atomic_read(&smmu_domain->nr_ats_masters))
return 0;

- cmds.num = 0;
+ arm_smmu_cmdq_batch_init(smmu_domain->smmu, &cmds, CMDS_INVALIDATION);

spin_lock_irqsave(&smmu_domain->devices_lock, flags);
list_for_each_entry(master_domain, &smmu_domain->devices,
@@ -2161,7 +2184,7 @@ static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
num_pages++;
}

- cmds.num = 0;
+ arm_smmu_cmdq_batch_init(smmu_domain->smmu, &cmds, CMDS_INVALIDATION);

while (iova < end) {
if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 9412fa4ff5e045..5651ea2541a0a2 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -576,6 +576,7 @@ struct arm_smmu_cmdq {

struct arm_smmu_cmdq_batch {
u64 cmds[CMDQ_BATCH_ENTRIES * CMDQ_ENT_DWORDS];
+ struct arm_smmu_cmdq *q;
int num;
};


2024-04-30 18:59:09

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v6 6/6] iommu/tegra241-cmdqv: Limit CMDs for guest owned VINTF

On Tue, Apr 30, 2024 at 02:06:55PM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 29, 2024 at 09:43:49PM -0700, Nicolin Chen wrote:
> > -struct arm_smmu_cmdq *tegra241_cmdqv_get_cmdq(struct arm_smmu_device *smmu)
> > +static bool tegra241_vintf_support_cmds(struct tegra241_vintf *vintf,
> > + u64 *cmds, int n)
> > +{
> > + int i;
> > +
> > + /* VINTF owned by hypervisor can execute any command */
> > + if (vintf->hyp_own)
> > + return true;
> > +
> > + /* Guest-owned VINTF must Check against the list of supported CMDs */
> > + for (i = 0; i < n; i++) {
> > + switch (FIELD_GET(CMDQ_0_OP, cmds[i * CMDQ_ENT_DWORDS])) {
> > + case CMDQ_OP_TLBI_NH_ASID:
> > + case CMDQ_OP_TLBI_NH_VA:
> > + case CMDQ_OP_ATC_INV:
>
> So CMDQ only works if not ARM_SMMU_FEAT_E2H? Probably worth mentioning
> that too along with the discussion about HYP

Nod. EL2/EL3 commands aren't supported. And they aren't supposed
to be issued by a guess either, since ARM64_HAS_VIRT_HOST_EXTN is
the feature of "Virtualization Host Extensions"?

>
> > + continue;
> > + default:
> > + return false;
> > + }
> > + }
> > +
> > + return true;
> > +}
>
> For a performance path this looping seems disappointing.. The callers
> don't actually mix different command type. Is there something
> preventing adding a parameter at the callers?

The callers don't seem to mix at this moment. Yet we would have
to be extra careful against any future SMMU patch that may mix
commands?

> Actually looking at this more closely, isn't the command q selection
> in the wrong place?
>
> Ie this batch stuff:
>
> static void arm_smmu_cmdq_batch_add(struct arm_smmu_device *smmu,
> struct arm_smmu_cmdq_batch *cmds,
> struct arm_smmu_cmdq_ent *cmd)
> {
> int index;
>
> if (cmds->num == CMDQ_BATCH_ENTRIES - 1 &&
> (smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC)) {
> arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, true);
> cmds->num = 0;
> }
>
> if (cmds->num == CMDQ_BATCH_ENTRIES) {
> arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, false);
> cmds->num = 0;
> }
>
> index = cmds->num * CMDQ_ENT_DWORDS;
> if (unlikely(arm_smmu_cmdq_build_cmd(&cmds->cmds[index], cmd))) {
> dev_warn(smmu->dev, "ignoring unknown CMDQ opcode 0x%x\n",
> cmd->opcode);
> return;
> }
>
> Has to push everything, across all the iterations of add/submut, onto
> the same CMDQ otherwise the SYNC won't be properly flushing?

ECMDQ seems to have such a limitation, but VCMDQs can get away
as HW can insert a SYNC to a queue that doesn't end with a SYNC.

> But each arm_smmu_cmdq_issue_cmdlist() calls its own get q
> function. Yes, they probably return the same Q since we are probably
> on the same CPU, but it seems logically wrong (and slower!) to
> organize it like this.
>
> I would expect the Q to be selected when the struct
> arm_smmu_cmdq_batch is allocated on the stack, and be the same for the
> entire batch operation. Not only do we spend less time trying to
> compute the Q to use we have a built in guarentee that every command
> will be on the same Q as the fenching SYNC.

This seems to be helpful to ECMDQ. The current version disables
the preempts, which feels costly to me.

> Something sort of like this as another patch?
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 268da20baa4e9c..d8c9597878315a 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -357,11 +357,22 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> return 0;
> }
>
> -static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smmu,
> - u64 *cmds, int n)
> +enum required_cmds {
> + CMDS_ALL,
> + /*
> + * Commands will be one of:
> + * CMDQ_OP_ATC_INV, CMDQ_OP_TLBI_EL2_VA, CMDQ_OP_TLBI_NH_VA,
> + * CMDQ_OP_TLBI_EL2_ASID, CMDQ_OP_TLBI_NH_ASID, CMDQ_OP_TLBI_S2_IPA,
> + * CMDQ_OP_TLBI_S12_VMALL, CMDQ_OP_SYNC
> + */
> + CMDS_INVALIDATION,
> +};

Hmm, guest-owned VCMDQs don't support EL2 commands. So, it feels
to be somehow complicated to decouple them further in the callers
of arm_smmu_cmdq_batch_add(). And I am not sure if there is a use
case of guest issuing CMDQ_OP_TLBI_S2_IPA/CMDQ_OP_TLBI_S12_VMALL
either, HW surprisingly supports these two though.

Perhaps we could just scan the first command in the batch, giving
a faith that no one will covertly sneak different commands in it?

Otherwise, there has to be a get_suported_cmdq callback so batch
or its callers can avoid adding unsupported commands at the first
place.

Thanks
Nicolin

2024-05-01 00:18:22

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 6/6] iommu/tegra241-cmdqv: Limit CMDs for guest owned VINTF

On Tue, Apr 30, 2024 at 11:58:44AM -0700, Nicolin Chen wrote:

> > Has to push everything, across all the iterations of add/submut, onto
> > the same CMDQ otherwise the SYNC won't be properly flushing?
>
> ECMDQ seems to have such a limitation, but VCMDQs can get away
> as HW can insert a SYNC to a queue that doesn't end with a SYNC.

That seems like a strange thing to do in HW, but I recall you
mentioned it once before. Still, I'm not sure there is any merit in
relying on it?

> > But each arm_smmu_cmdq_issue_cmdlist() calls its own get q
> > function. Yes, they probably return the same Q since we are probably
> > on the same CPU, but it seems logically wrong (and slower!) to
> > organize it like this.
> >
> > I would expect the Q to be selected when the struct
> > arm_smmu_cmdq_batch is allocated on the stack, and be the same for the
> > entire batch operation. Not only do we spend less time trying to
> > compute the Q to use we have a built in guarentee that every command
> > will be on the same Q as the fenching SYNC.
>
> This seems to be helpful to ECMDQ. The current version disables
> the preempts, which feels costly to me.

Oh, yes, definately should work like this then!

> > Something sort of like this as another patch?
> >
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index 268da20baa4e9c..d8c9597878315a 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -357,11 +357,22 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> > return 0;
> > }
> >
> > -static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smmu,
> > - u64 *cmds, int n)
> > +enum required_cmds {
> > + CMDS_ALL,
> > + /*
> > + * Commands will be one of:
> > + * CMDQ_OP_ATC_INV, CMDQ_OP_TLBI_EL2_VA, CMDQ_OP_TLBI_NH_VA,
> > + * CMDQ_OP_TLBI_EL2_ASID, CMDQ_OP_TLBI_NH_ASID, CMDQ_OP_TLBI_S2_IPA,
> > + * CMDQ_OP_TLBI_S12_VMALL, CMDQ_OP_SYNC
> > + */
> > + CMDS_INVALIDATION,
> > +};
>
> Hmm, guest-owned VCMDQs don't support EL2 commands. So, it feels
> to be somehow complicated to decouple them further in the callers
> of arm_smmu_cmdq_batch_add(). And I am not sure if there is a use
> case of guest issuing CMDQ_OP_TLBI_S2_IPA/CMDQ_OP_TLBI_S12_VMALL
> either, HW surprisingly supports these two though.

These are the max commands that could be issued, but they are all
gated based on the feature bits. The ones VCMDQ don't support are not
going to be issued because of the feature bits. You could test and
enforce this when probing the ECMDQ parts.

> Perhaps we could just scan the first command in the batch, giving
> a faith that no one will covertly sneak different commands in it?

Yes with the current design that does seem to work, but it also feels
a bit obtuse.

> Otherwise, there has to be a get_suported_cmdq callback so batch
> or its callers can avoid adding unsupported commands at the first
> place.

If you really feel strongly the invalidation could be split into
S1/S2/S1_VM groupings that align with the feature bits and that could
be passed down from one step above. But I don't think the complexity
is really needed. It is better to deal with it through the feature
mechanism.

If high speed invalidation is supported then the invalidation queue
must support all the invalidation commands used by the SMMU's active
feature set, otherwise do not enable the invalidation queue. It does
make logical sense.

Jason

2024-05-01 13:00:58

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 5/6] iommu/arm-smmu-v3: Add in-kernel support for NVIDIA Tegra241 (Grace) CMDQV

On Tue, Apr 30, 2024 at 11:08:55AM -0700, Nicolin Chen wrote:
> (Removing chunks that I simply ack)
>
> On Tue, Apr 30, 2024 at 01:35:45PM -0300, Jason Gunthorpe wrote:
> > On Mon, Apr 29, 2024 at 09:43:48PM -0700, Nicolin Chen wrote:
>
> > > +/* MMIO helpers */
> > > +#define cmdqv_readl(reg) \
> > > + readl(cmdqv->base + TEGRA241_CMDQV_##reg)
> > > +#define cmdqv_readl_relaxed(reg) \
> > > + readl_relaxed(cmdqv->base + TEGRA241_CMDQV_##reg)
> > > +#define cmdqv_writel(val, reg) \
> > > + writel((val), cmdqv->base + TEGRA241_CMDQV_##reg)
> > > +#define cmdqv_writel_relaxed(val, reg) \
> > > + writel_relaxed((val), cmdqv->base + TEGRA241_CMDQV_##reg)
> >
> > Please don't hide access to a stack variable in a macro, and I'm not
> > keen on the ##reg scheme either - it makes it much harder to search
> > for things.
>
> I can pass in cmdqv/vintf/vcmdq pointers, if it would be better.
>
> > Really this all seems like alot of overkill to make a little bit of
> > shorthand. It is not so wordy just to type it out:
> >
> > readl(vintf->base + TEGRA241_VINTF_CONFIG)
>
> vintf_readl(vintf, CONFIG) is much shorter. Doing so reduced the
> line breaks at quite a lot places, so overall the driver looks a
> lot cleaner to me.

We don't have the strict 80 column limit now, it would be fine to go a
few extra to avoid the breaks.

Certainly preferred to these readability damaging macros.

> I can probably change these logging helpers to inline functions.

Just call the normal logging functions directly.

> > > +#define vintf_warn(fmt, ...) \
> > > + dev_warn(vintf->cmdqv->dev, "VINTF%u: " fmt, vintf->idx, ##__VA_ARGS__)
> > > +#define vintf_err(fmt, ...) \
> > > + dev_err(vintf->cmdqv->dev, "VINTF%u: " fmt, vintf->idx, ##__VA_ARGS__)
> > > +#define vintf_info(fmt, ...) \
> > > + dev_info(vintf->cmdqv->dev, "VINTF%u: " fmt, vintf->idx, ##__VA_ARGS__)
> > > +#define vintf_dbg(fmt, ...) \
> > > + dev_dbg(vintf->cmdqv->dev, "VINTF%u: " fmt, vintf->idx, ##__VA_ARGS__)
> > > +
> > > +#define vcmdq_warn(fmt, ...) \
> > > + ({ \
> > > + struct tegra241_vintf *vintf = vcmdq->vintf; \
> > > + if (vintf) \
> > > + vintf_warn("VCMDQ%u/LVCMDQ%u: " fmt, \
> > > + vcmdq->idx, vcmdq->lidx, \
> > > + ##__VA_ARGS__); \
> > > + else \
> > > + dev_warn(vcmdq->cmdqv->dev, "VCMDQ%u: " fmt, \
> > > + vcmdq->idx, ##__VA_ARGS__); \
> > > + })
>
> > Some of these are barely used, is it worth all these macros??
>
> Only vcmdq_warn isn't called. But I think it would be useful.
> I could also find a place to call it, if that's a must.

Just call the normal logging functions, there are so few callers
typing out the VCMDQ%u is not going to be so bad



> > > +
> > > +/* Configuring and polling helpers */
> > > +#define tegra241_cmdqv_write_config(_owner, _OWNER, _regval) \
> > > + ({ \
> > > + bool _en = (_regval) & _OWNER##_EN; \
> > > + u32 _status; \
> > > + int _ret; \
> > > + writel((_regval), _owner->base + TEGRA241_##_OWNER##_CONFIG); \
> > > + _ret = readl_poll_timeout( \
> > > + _owner->base + TEGRA241_##_OWNER##_STATUS, _status, \
> > > + _en ? (_regval) & _OWNER##_ENABLED : \
> > > + !((_regval) & _OWNER##_ENABLED), \
> > > + 1, ARM_SMMU_POLL_TIMEOUT_US); \
> > > + if (_ret) \
> > > + _owner##_err("failed to %sable, STATUS = 0x%08X\n", \
> > > + _en ? "en" : "dis", _status); \
> > > + atomic_set(&_owner->status, _status); \
> > > + _ret; \
> > > + })
> >
> > I feel like this could be an actual inline function without the macro
> > wrapper with a little fiddling.
>
> It would be unrolled to three mostly identical inline functions:
> tegra241_cmdqv_write_config(cmdqv, regval)
> tegra241_vintf_write_config(vintf, regval)
> tegra241_vcmdq_write_config(vcmdq, regval)

Expand the parameters in the caller:

__do_write_config(owner->base, &owner->status, _CMDQV_EN, TEGRA241_CMDQ_CONFIG,
TEGRA241_CMDQ_STATUS, _CMDQ_ENABLED)

> > > +#define cmdqv_write_config(_regval) \
> > > + tegra241_cmdqv_write_config(cmdqv, CMDQV, _regval)
> > > +#define vintf_write_config(_regval) \
> > > + tegra241_cmdqv_write_config(vintf, VINTF, _regval)
> > > +#define vcmdq_write_config(_regval) \
> > > + tegra241_cmdqv_write_config(vcmdq, VCMDQ, _regval)
> >
> > More hidden access to stack values
>
> Btw, any reason for forbidding this practice? It will break the
> build if something goes wrong, which seems to be pretty easy to
> catch.

It is the kernel consensus not to do that. function-like-macros should
act like functions and not reach into some other stack frame. It makes
it very hard to follow the calling function if you can't follow where
the references are.

> > > + /* Use SMMU CMDQ if vintfs[0] is uninitialized */
> > > + if (!FIELD_GET(VINTF_ENABLED, atomic_read(&vintf->status)))
> > > + return &smmu->cmdq;
> > > +
> > > + /* Use SMMU CMDQ if vintfs[0] has error status */
> > > + if (FIELD_GET(VINTF_STATUS, atomic_read(&vintf->status)))
> > > + return &smmu->cmdq;
> >
> > Why atomic_read? The unlocked interaction with
> > tegra241_cmdqv_handle_vintf0_error() doesn't seem especially sane IMHO
>
> Race between this get_cmdq() and the isr. Any alternative practice?

It doesn't fix any real race, I'm not sure what this is supposed to be
doing. The cmdq becomes broken and you get an ISR, so before the ISR
it will still post but get stuck, during the ISR it will avoid
posting, and after it will go back to posting?

Why? Just always post to the Q and let the ISR fix it?

> > > +static void tegra241_vcmdq_hw_deinit(struct tegra241_vcmdq *vcmdq)
> > > +{
> > > + u32 gerrorn, gerror;
> > > +
> > > + if (vcmdq_write_config(0)) {
> > > + vcmdq_err("GERRORN=0x%X\n", vcmdq_page0_readl_relaxed(GERRORN));
> > > + vcmdq_err("GERROR=0x%X\n", vcmdq_page0_readl_relaxed(GERROR));
> > > + vcmdq_err("CONS=0x%X\n", vcmdq_page0_readl_relaxed(CONS));
> >
> > Less prints, include a unique message about why this is being
> > printed..
>
> Something must be wrong if disabling VCMDQ fails, so the prints of
> error register values would be helpful. And "failed to disable" is
> already printed by the vcmdq_write_config() call. I can merge them
> into one vcmdq_err call though.

Print on one line
> > > +static void tegra241_vcmdq_free_smmu_cmdq(struct tegra241_vcmdq *vcmdq)
> > > +{
> > > + struct tegra241_cmdqv *cmdqv = vcmdq->cmdqv;
> > > + struct arm_smmu_queue *q = &vcmdq->cmdq.q;
> > > + size_t nents = 1 << q->llq.max_n_shift;
> > > +
> > > + dmam_free_coherent(cmdqv->smmu->dev, (nents * CMDQ_ENT_DWORDS) << 3,
> > > + q->base, q->base_dma);
> >
> > If we are calling dmam_free, do we really need devm at all?
>
> Hmm. This is a part of SMMU's probe/device_reset().

But that is a proper device driver, this isn't.

> > > + struct tegra241_cmdqv *cmdqv;
> > > +
> > > + cmdqv = tegra241_cmdqv_find_resource(smmu, id);
> > > + if (!cmdqv)
> > > + return NULL;
> > > +
> > > + if (tegra241_cmdqv_probe(cmdqv)) {
> > > + if (cmdqv->irq > 0)
> > > + devm_free_irq(smmu->dev, cmdqv->irq, cmdqv);
> > > + devm_iounmap(smmu->dev, cmdqv->base);
> > > + devm_kfree(smmu->dev, cmdqv);
> > > + return NULL;
> >
> > Oh. Please don't use devm at all in this code then, it is not attached
> > to a probed driver with the proper scope, devm isn't going to work in
> > sensible way.
>
> Mind elaborating "it is not"? This function is called by
> arm_smmu_device_acpi_probe and arm_smmu_device_probe.

Normal devm usage will unwind the devm allocations when probe fails.

That doesn't happen here, you open coded the unwind above, and then
you have open coded freeing in another place anyhow.

So just don't use it. There is no value if the places where it should
work automatically are not functioning.

Jason

2024-05-01 16:33:17

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v6 6/6] iommu/tegra241-cmdqv: Limit CMDs for guest owned VINTF

On Tue, Apr 30, 2024 at 09:17:58PM -0300, Jason Gunthorpe wrote:
> On Tue, Apr 30, 2024 at 11:58:44AM -0700, Nicolin Chen wrote:
>
> > > Has to push everything, across all the iterations of add/submut, onto
> > > the same CMDQ otherwise the SYNC won't be properly flushing?
> >
> > ECMDQ seems to have such a limitation, but VCMDQs can get away
> > as HW can insert a SYNC to a queue that doesn't end with a SYNC.
>
> That seems like a strange thing to do in HW, but I recall you
> mentioned it once before. Still, I'm not sure there is any merit in
> relying on it?

I was hoping to get some idea from the designer. Yet, at this
moment, let's say there's likely no merit besides SW can care
less and stay simpler, AFAIK.

Robin previously remarked that there could be some performance
impact from such a feature, so I think adding your patch would
be nicer.

> > > Something sort of like this as another patch?
> > >
> > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > index 268da20baa4e9c..d8c9597878315a 100644
> > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > @@ -357,11 +357,22 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> > > return 0;
> > > }
> > >
> > > -static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smmu,
> > > - u64 *cmds, int n)
> > > +enum required_cmds {
> > > + CMDS_ALL,
> > > + /*
> > > + * Commands will be one of:
> > > + * CMDQ_OP_ATC_INV, CMDQ_OP_TLBI_EL2_VA, CMDQ_OP_TLBI_NH_VA,
> > > + * CMDQ_OP_TLBI_EL2_ASID, CMDQ_OP_TLBI_NH_ASID, CMDQ_OP_TLBI_S2_IPA,
> > > + * CMDQ_OP_TLBI_S12_VMALL, CMDQ_OP_SYNC
> > > + */
> > > + CMDS_INVALIDATION,
> > > +};
> >
> > Hmm, guest-owned VCMDQs don't support EL2 commands. So, it feels
> > to be somehow complicated to decouple them further in the callers
> > of arm_smmu_cmdq_batch_add(). And I am not sure if there is a use
> > case of guest issuing CMDQ_OP_TLBI_S2_IPA/CMDQ_OP_TLBI_S12_VMALL
> > either, HW surprisingly supports these two though.
>
> These are the max commands that could be issued, but they are all
> gated based on the feature bits. The ones VCMDQ don't support are not
> going to be issued because of the feature bits. You could test and
> enforce this when probing the ECMDQ parts.

Ah, I see. So cmdqv's probe() could check the smmu->features
against ARM_SMMU_FEAT_E2H, and disable VINTF0 right away, in
case of guest-owned while E2H is present.

And we could do the same for ARM_SMMU_FEAT_TRANS_S2, stating
that the driver does not expect use cases of issuing S2 TLBI
commands from the guest OS.

Thanks
Nicolin

2024-05-02 12:41:20

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 5/6] iommu/arm-smmu-v3: Add in-kernel support for NVIDIA Tegra241 (Grace) CMDQV

On Wed, May 01, 2024 at 10:43:39AM -0700, Nicolin Chen wrote:
> > It doesn't fix any real race, I'm not sure what this is supposed to be
> > doing. The cmdq becomes broken and you get an ISR, so before the ISR
> > it will still post but get stuck, during the ISR it will avoid
> > posting, and after it will go back to posting?
> >
> > Why? Just always post to the Q and let the ISR fix it?
>
> Yes, we could do so. I was thinking of the worst case by giving
> the guest OS a chance to continue (though in a slower mode), if
> something unrecoverable happens to the VINTF/VCMDQ part.

Does that happn? The stuck vcmdq will have stuck entries on it no
matter what, can we actually fully recover from that? Ie re-issue the
commands on another queue?

> > So just don't use it. There is no value if the places where it should
> > work automatically are not functioning.
>
> I thought devm could work when rmmod too, not only when the probe
> fails..

It is limited to cases when the probing driver of the passed struct
device unbinds, including probe failure.

Jason

2024-05-02 19:27:05

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v6 5/6] iommu/arm-smmu-v3: Add in-kernel support for NVIDIA Tegra241 (Grace) CMDQV

On Thu, May 02, 2024 at 09:41:03AM -0300, Jason Gunthorpe wrote:
> On Wed, May 01, 2024 at 10:43:39AM -0700, Nicolin Chen wrote:
> > > It doesn't fix any real race, I'm not sure what this is supposed to be
> > > doing. The cmdq becomes broken and you get an ISR, so before the ISR
> > > it will still post but get stuck, during the ISR it will avoid
> > > posting, and after it will go back to posting?
> > >
> > > Why? Just always post to the Q and let the ISR fix it?
> >
> > Yes, we could do so. I was thinking of the worst case by giving
> > the guest OS a chance to continue (though in a slower mode), if
> > something unrecoverable happens to the VINTF/VCMDQ part.
>
> Does that happn? The stuck vcmdq will have stuck entries on it no
> matter what, can we actually fully recover from that? Ie re-issue the
> commands on another queue?

Well, the handle_vintf0_error() should fix that and recover. And
rethinking about this, if this happens it's likely a SW bug that
we shouldn't ignore.

With that being said, the viommu infrastructure still needs an irq
forwarding that is currently missing. I'd need to draft something
likely on top of Baolu's work.

> > > So just don't use it. There is no value if the places where it should
> > > work automatically are not functioning.
> >
> > I thought devm could work when rmmod too, not only when the probe
> > fails..
>
> It is limited to cases when the probing driver of the passed struct
> device unbinds, including probe failure.

OK. I'll drop all devm_ and add tegra241_cmdqv_device_remove()
instead.

Thanks
Nicolin

2024-05-06 03:52:52

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v6 6/6] iommu/tegra241-cmdqv: Limit CMDs for guest owned VINTF

On Tue, Apr 30, 2024 at 09:17:58PM -0300, Jason Gunthorpe wrote:
> On Tue, Apr 30, 2024 at 11:58:44AM -0700, Nicolin Chen wrote:
> > Otherwise, there has to be a get_suported_cmdq callback so batch
> > or its callers can avoid adding unsupported commands at the first
> > place.
>
> If you really feel strongly the invalidation could be split into
> S1/S2/S1_VM groupings that align with the feature bits and that could
> be passed down from one step above. But I don't think the complexity
> is really needed. It is better to deal with it through the feature
> mechanism.

Hmm, I tried following your design by passing in a CMD_TYPE_xxx
to the tegra241_cmdqv_get_cmdq(), but I found a little painful
to accommodate these two cases:
1. TLBI_NH_ASID is issued via arm_smmu_cmdq_issue_cmdlist(), so
we should not mark it as CMD_TYPE_ALL. Yet, this function is
used by other commands too. So, either we pass in a type from
higher callers, or simply check the opcode in that function.
2. It is a bit tricky to define, from SMMU's P.O.V, a good TYPE
subset for VCMDQ, since guest-owned VCMDQ does not support
TLBI_NSNH_ALL.

So, it feels to me that checking against the opcode is still a
straightforward solution. And what I ended up with is somewhat
similar to this v6, yet this time it only checks at batch init
call as your design does.

How do you think of this?

Thanks
Nicolin


Attachments:
(No filename) (1.39 kB)
cmdq_limit_mine.patch (6.84 kB)
cmdq_limit_mine.patch
Download all attachments

2024-05-06 13:00:20

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 6/6] iommu/tegra241-cmdqv: Limit CMDs for guest owned VINTF

On Sun, May 05, 2024 at 08:52:32PM -0700, Nicolin Chen wrote:
> On Tue, Apr 30, 2024 at 09:17:58PM -0300, Jason Gunthorpe wrote:
> > On Tue, Apr 30, 2024 at 11:58:44AM -0700, Nicolin Chen wrote:
> > > Otherwise, there has to be a get_suported_cmdq callback so batch
> > > or its callers can avoid adding unsupported commands at the first
> > > place.
> >
> > If you really feel strongly the invalidation could be split into
> > S1/S2/S1_VM groupings that align with the feature bits and that could
> > be passed down from one step above. But I don't think the complexity
> > is really needed. It is better to deal with it through the feature
> > mechanism.
>
> Hmm, I tried following your design by passing in a CMD_TYPE_xxx
> to the tegra241_cmdqv_get_cmdq(), but I found a little painful
> to accommodate these two cases:
> 1. TLBI_NH_ASID is issued via arm_smmu_cmdq_issue_cmdlist(), so
> we should not mark it as CMD_TYPE_ALL. Yet, this function is
> used by other commands too. So, either we pass in a type from
> higher callers, or simply check the opcode in that function.

Yes, you'd have to pass in the type there too, which makes it more
ugly.

> So, it feels to me that checking against the opcode is still a
> straightforward solution. And what I ended up with is somewhat
> similar to this v6, yet this time it only checks at batch init
> call as your design does.

Well, the only downside is that the commands have to be same in a
batch, but maybe that is OK anyhow.

Don't forget to take the hunks that fix the queue as well.

Jason