2023-07-19 19:51:21

by Tomasz Jeznach

[permalink] [raw]
Subject: [PATCH 00/13] Linux RISC-V IOMMU Support

The RISC-V IOMMU specification is now ratified as-per the RISC-V international
process [1]. The latest frozen specifcation can be found at:
https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf

At a high-level, the RISC-V IOMMU specification defines:
1) Memory-mapped programming interface
- Mandatory and optional registers layout and description.
- Software guidelines for device initialization and capabilities discovery.
2) In-memory queue interface
- A command-queue used by software to queue commands to the IOMMU.
- A fault/event queue used to bring faults and events to software’s attention.
- A page-request queue used to report “Page Request” messages received from
PCIe devices.
- Message-signalled and wire-signaled interrupt mechanism.
3) In-memory data structures
- Device-context: used to associate a device with an address space and to hold
other per-device parameters used by the IOMMU to perform address translations.
- Process-contexts: used to associate a different virtual address space based on
device provided process identification number.
- MSI page table configuration used to direct an MSI to a guest interrupt file
in an IMSIC. The MSI page table formats are defined by the Advanced Interrupt
Architecture specification [2].

This series introduces complete single-level translation support, including shared
virtual address (SVA), ATS/PRI interfaces in the kernel driver. Patches adding MSI
identity remapping and G-Stage translation (GPA to SPA) are added only to excercise
hardware interfaces, to be complemented with AIA/KVM bindings in follow-up series.

This series is a logical regrouping of series of incremental patches based on
RISC-V International IOMMU Task Group discussions and specification development
process. Original series can be found at the maintainer's repository branch [3].

These patches can also be found in the riscv_iommu_v1 branch at:
https://github.com/tjeznach/linux/tree/riscv_iommu_v1

To test this series, use QEMU/OpenSBI with RISC-V IOMMU implementation available in
the riscv_iommu_v1 branch at:
https://github.com/tjeznach/qemu/tree/riscv_iommu_v1

References:
[1] - https://wiki.riscv.org/display/HOME/Specification+Status
[2] - https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
[3] - https://github.com/tjeznach/qemu/tree/tjeznach/riscv-iommu-20230719


Anup Patel (1):
dt-bindings: Add RISC-V IOMMU bindings

Tomasz Jeznach (10):
RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
RISC-V: arch/riscv/config: enable RISC-V IOMMU support
MAINTAINERS: Add myself for RISC-V IOMMU driver
RISC-V: drivers/iommu/riscv: Add sysfs interface
RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
RISC-V: drivers/iommu/riscv: Add device context support
RISC-V: drivers/iommu/riscv: Add page table support
RISC-V: drivers/iommu/riscv: Add SVA with PASID/ATS/PRI support.
RISC-V: drivers/iommu/riscv: Add MSI identity remapping
RISC-V: drivers/iommu/riscv: Add G-Stage translation support

.../bindings/iommu/riscv,iommu.yaml | 146 ++
MAINTAINERS | 7 +
arch/riscv/configs/defconfig | 1 +
drivers/iommu/Kconfig | 1 +
drivers/iommu/Makefile | 2 +-
drivers/iommu/io-pgtable.c | 3 +
drivers/iommu/riscv/Kconfig | 22 +
drivers/iommu/riscv/Makefile | 1 +
drivers/iommu/riscv/io_pgtable.c | 266 ++
drivers/iommu/riscv/iommu-bits.h | 704 ++++++
drivers/iommu/riscv/iommu-pci.c | 206 ++
drivers/iommu/riscv/iommu-platform.c | 160 ++
drivers/iommu/riscv/iommu-sysfs.c | 183 ++
drivers/iommu/riscv/iommu.c | 2130 +++++++++++++++++
drivers/iommu/riscv/iommu.h | 165 ++
include/linux/io-pgtable.h | 2 +
16 files changed, 3998 insertions(+), 1 deletion(-)
create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
create mode 100644 drivers/iommu/riscv/Kconfig
create mode 100644 drivers/iommu/riscv/Makefile
create mode 100644 drivers/iommu/riscv/io_pgtable.c
create mode 100644 drivers/iommu/riscv/iommu-bits.h
create mode 100644 drivers/iommu/riscv/iommu-pci.c
create mode 100644 drivers/iommu/riscv/iommu-platform.c
create mode 100644 drivers/iommu/riscv/iommu-sysfs.c
create mode 100644 drivers/iommu/riscv/iommu.c
create mode 100644 drivers/iommu/riscv/iommu.h

--
2.34.1



2023-07-19 19:51:24

by Tomasz Jeznach

[permalink] [raw]
Subject: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings

From: Anup Patel <[email protected]>

We add DT bindings document for RISC-V IOMMU platform and PCI devices
defined by the RISC-V IOMMU specification.

Signed-off-by: Anup Patel <[email protected]>
---
.../bindings/iommu/riscv,iommu.yaml | 146 ++++++++++++++++++
1 file changed, 146 insertions(+)
create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml

diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
new file mode 100644
index 000000000000..8a9aedb61768
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
@@ -0,0 +1,146 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: RISC-V IOMMU Implementation
+
+maintainers:
+ - Tomasz Jeznach <[email protected]>
+
+description:
+ The RISC-V IOMMU specificaiton defines an IOMMU for RISC-V platforms
+ which can be a regular platform device or a PCI device connected to
+ the host root port.
+
+ The RISC-V IOMMU provides two stage translation, device directory table,
+ command queue and fault reporting as wired interrupt or MSIx event for
+ both PCI and platform devices.
+
+ Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
+
+properties:
+ compatible:
+ oneOf:
+ - description: RISC-V IOMMU as a platform device
+ items:
+ - enum:
+ - vendor,chip-iommu
+ - const: riscv,iommu
+
+ - description: RISC-V IOMMU as a PCI device connected to root port
+ items:
+ - enum:
+ - vendor,chip-pci-iommu
+ - const: riscv,pci-iommu
+
+ reg:
+ maxItems: 1
+ description:
+ For RISC-V IOMMU as a platform device, this represents the MMIO base
+ address of registers.
+
+ For RISC-V IOMMU as a PCI device, this represents the PCI-PCI bridge
+ details as described in Documentation/devicetree/bindings/pci/pci.txt
+
+ '#iommu-cells':
+ const: 2
+ description: |
+ Each IOMMU specifier represents the base device ID and number of
+ device IDs.
+
+ interrupts:
+ minItems: 1
+ maxItems: 16
+ description:
+ The presence of this property implies that given RISC-V IOMMU uses
+ wired interrupts to notify the RISC-V HARTS (or CPUs).
+
+ msi-parent:
+ description:
+ The presence of this property implies that given RISC-V IOMMU uses
+ MSIx to notify the RISC-V HARTs (or CPUs). This property should be
+ considered only when the interrupts property is absent.
+
+ dma-coherent:
+ description:
+ Present if page table walks and DMA accessed made by the RISC-V IOMMU
+ are cache coherent with the CPU.
+
+ power-domains:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+ - '#iommu-cells'
+
+additionalProperties: false
+
+examples:
+ - |
+ /* Example 1 (IOMMU platform device with wired interrupts) */
+ immu1: iommu@1bccd000 {
+ compatible = "vendor,chip-iommu", "riscv,iommu";
+ reg = <0x1bccd000 0x1000>;
+ interrupt-parent = <&aplic_smode>;
+ interrupts = <32 4>, <33 4>, <34 4>, <35 4>;
+ #iommu-cells = <2>;
+ };
+
+ /* Device with two IOMMU device IDs, 0 and 7 */
+ master1 {
+ iommus = <&immu1 0 1>, <&immu1 7 1>;
+ };
+
+ - |
+ /* Example 2 (IOMMU platform device with MSIs) */
+ immu2: iommu@1bcdd000 {
+ compatible = "vendor,chip-iommu", "riscv,iommu";
+ reg = <0x1bccd000 0x1000>;
+ msi-parent = <&imsics_smode>;
+ #iommu-cells = <2>;
+ };
+
+ bus {
+ #address-cells = <2>;
+ #size-cells = <2>;
+
+ /* Device with IOMMU device IDs ranging from 32 to 64 */
+ master1 {
+ iommus = <&immu2 32 32>;
+ };
+
+ pcie@40000000 {
+ compatible = "pci-host-cam-generic";
+ device_type = "pci";
+ #address-cells = <3>;
+ #size-cells = <2>;
+ bus-range = <0x0 0x1>;
+
+ /* CPU_PHYSICAL(2) SIZE(2) */
+ reg = <0x0 0x40000000 0x0 0x1000000>;
+
+ /* BUS_ADDRESS(3) CPU_PHYSICAL(2) SIZE(2) */
+ ranges = <0x01000000 0x0 0x01000000 0x0 0x01000000 0x0 0x00010000>,
+ <0x02000000 0x0 0x41000000 0x0 0x41000000 0x0 0x3f000000>;
+
+ #interrupt-cells = <0x1>;
+
+ /* PCI_DEVICE(3) INT#(1) CONTROLLER(PHANDLE) CONTROLLER_DATA(2) */
+ interrupt-map = < 0x0 0x0 0x0 0x1 &aplic_smode 0x4 0x1>,
+ < 0x800 0x0 0x0 0x1 &aplic_smode 0x5 0x1>,
+ <0x1000 0x0 0x0 0x1 &aplic_smode 0x6 0x1>,
+ <0x1800 0x0 0x0 0x1 &aplic_smode 0x7 0x1>;
+
+ /* PCI_DEVICE(3) INT#(1) */
+ interrupt-map-mask = <0xf800 0x0 0x0 0x7>;
+
+ msi-parent = <&imsics_smode>;
+
+ /* Devices with bus number 0-127 are mastered via immu2 */
+ iommu-map = <0x0000 &immu2 0x0000 0x8000>;
+ };
+ };
+...
--
2.34.1


2023-07-19 20:05:20

by Tomasz Jeznach

[permalink] [raw]
Subject: [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface

Enable sysfs debug / visibility interface providing restricted
access to hardware registers.

Signed-off-by: Tomasz Jeznach <[email protected]>
---
drivers/iommu/riscv/Makefile | 2 +-
drivers/iommu/riscv/iommu-sysfs.c | 183 ++++++++++++++++++++++++++++++
drivers/iommu/riscv/iommu.c | 7 ++
drivers/iommu/riscv/iommu.h | 2 +
4 files changed, 193 insertions(+), 1 deletion(-)
create mode 100644 drivers/iommu/riscv/iommu-sysfs.c

diff --git a/drivers/iommu/riscv/Makefile b/drivers/iommu/riscv/Makefile
index 38730c11e4a8..9523eb053cfc 100644
--- a/drivers/iommu/riscv/Makefile
+++ b/drivers/iommu/riscv/Makefile
@@ -1 +1 @@
-obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o
\ No newline at end of file
+obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o iommu-sysfs.o
\ No newline at end of file
diff --git a/drivers/iommu/riscv/iommu-sysfs.c b/drivers/iommu/riscv/iommu-sysfs.c
new file mode 100644
index 000000000000..f038ea8445c5
--- /dev/null
+++ b/drivers/iommu/riscv/iommu-sysfs.c
@@ -0,0 +1,183 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * IOMMU API for RISC-V architected Ziommu implementations.
+ *
+ * Copyright © 2022-2023 Rivos Inc.
+ *
+ * Author: Tomasz Jeznach <[email protected]>
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/compiler.h>
+#include <linux/iommu.h>
+#include <linux/platform_device.h>
+#include <asm/page.h>
+
+#include "iommu.h"
+
+#define sysfs_dev_to_iommu(dev) \
+ container_of(dev_get_drvdata(dev), struct riscv_iommu_device, iommu)
+
+static ssize_t address_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev);
+ return sprintf(buf, "%llx\n", iommu->reg_phys);
+}
+
+static DEVICE_ATTR_RO(address);
+
+#define ATTR_RD_REG32(name, offset) \
+ ssize_t reg_ ## name ## _show(struct device *dev, \
+ struct device_attribute *attr, char *buf) \
+{ \
+ struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev); \
+ return sprintf(buf, "0x%x\n", \
+ riscv_iommu_readl(iommu, offset)); \
+}
+
+#define ATTR_RD_REG64(name, offset) \
+ ssize_t reg_ ## name ## _show(struct device *dev, \
+ struct device_attribute *attr, char *buf) \
+{ \
+ struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev); \
+ return sprintf(buf, "0x%llx\n", \
+ riscv_iommu_readq(iommu, offset)); \
+}
+
+#define ATTR_WR_REG32(name, offset) \
+ ssize_t reg_ ## name ## _store(struct device *dev, \
+ struct device_attribute *attr, \
+ const char *buf, size_t len) \
+{ \
+ struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev); \
+ unsigned long val; \
+ int ret; \
+ ret = kstrtoul(buf, 0, &val); \
+ if (ret) \
+ return ret; \
+ riscv_iommu_writel(iommu, offset, val); \
+ return len; \
+}
+
+#define ATTR_WR_REG64(name, offset) \
+ ssize_t reg_ ## name ## _store(struct device *dev, \
+ struct device_attribute *attr, \
+ const char *buf, size_t len) \
+{ \
+ struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev); \
+ unsigned long long val; \
+ int ret; \
+ ret = kstrtoull(buf, 0, &val); \
+ if (ret) \
+ return ret; \
+ riscv_iommu_writeq(iommu, offset, val); \
+ return len; \
+}
+
+#define ATTR_RO_REG32(name, offset) \
+static ATTR_RD_REG32(name, offset); \
+static DEVICE_ATTR_RO(reg_ ## name)
+
+#define ATTR_RW_REG32(name, offset) \
+static ATTR_RD_REG32(name, offset); \
+static ATTR_WR_REG32(name, offset); \
+static DEVICE_ATTR_RW(reg_ ## name)
+
+#define ATTR_RO_REG64(name, offset) \
+static ATTR_RD_REG64(name, offset); \
+static DEVICE_ATTR_RO(reg_ ## name)
+
+#define ATTR_RW_REG64(name, offset) \
+static ATTR_RD_REG64(name, offset); \
+static ATTR_WR_REG64(name, offset); \
+static DEVICE_ATTR_RW(reg_ ## name)
+
+ATTR_RO_REG64(cap, RISCV_IOMMU_REG_CAP);
+ATTR_RO_REG64(fctl, RISCV_IOMMU_REG_FCTL);
+ATTR_RO_REG32(cqh, RISCV_IOMMU_REG_CQH);
+ATTR_RO_REG32(cqt, RISCV_IOMMU_REG_CQT);
+ATTR_RO_REG32(cqcsr, RISCV_IOMMU_REG_CQCSR);
+ATTR_RO_REG32(fqh, RISCV_IOMMU_REG_FQH);
+ATTR_RO_REG32(fqt, RISCV_IOMMU_REG_FQT);
+ATTR_RO_REG32(fqcsr, RISCV_IOMMU_REG_FQCSR);
+ATTR_RO_REG32(pqh, RISCV_IOMMU_REG_PQH);
+ATTR_RO_REG32(pqt, RISCV_IOMMU_REG_PQT);
+ATTR_RO_REG32(pqcsr, RISCV_IOMMU_REG_PQCSR);
+ATTR_RO_REG32(ipsr, RISCV_IOMMU_REG_IPSR);
+ATTR_RO_REG32(ivec, RISCV_IOMMU_REG_IVEC);
+ATTR_RW_REG64(tr_iova, RISCV_IOMMU_REG_TR_REQ_IOVA);
+ATTR_RW_REG64(tr_ctrl, RISCV_IOMMU_REG_TR_REQ_CTL);
+ATTR_RW_REG64(tr_response, RISCV_IOMMU_REG_TR_RESPONSE);
+ATTR_RW_REG32(iocntovf, RISCV_IOMMU_REG_IOCOUNTOVF);
+ATTR_RW_REG32(iocntinh, RISCV_IOMMU_REG_IOCOUNTINH);
+ATTR_RW_REG64(iohpmcycles, RISCV_IOMMU_REG_IOHPMCYCLES);
+ATTR_RW_REG64(iohpmevt_1, RISCV_IOMMU_REG_IOHPMEVT(0));
+ATTR_RW_REG64(iohpmevt_2, RISCV_IOMMU_REG_IOHPMEVT(1));
+ATTR_RW_REG64(iohpmevt_3, RISCV_IOMMU_REG_IOHPMEVT(2));
+ATTR_RW_REG64(iohpmevt_4, RISCV_IOMMU_REG_IOHPMEVT(3));
+ATTR_RW_REG64(iohpmevt_5, RISCV_IOMMU_REG_IOHPMEVT(4));
+ATTR_RW_REG64(iohpmevt_6, RISCV_IOMMU_REG_IOHPMEVT(5));
+ATTR_RW_REG64(iohpmevt_7, RISCV_IOMMU_REG_IOHPMEVT(6));
+ATTR_RW_REG64(iohpmctr_1, RISCV_IOMMU_REG_IOHPMCTR(0));
+ATTR_RW_REG64(iohpmctr_2, RISCV_IOMMU_REG_IOHPMCTR(1));
+ATTR_RW_REG64(iohpmctr_3, RISCV_IOMMU_REG_IOHPMCTR(2));
+ATTR_RW_REG64(iohpmctr_4, RISCV_IOMMU_REG_IOHPMCTR(3));
+ATTR_RW_REG64(iohpmctr_5, RISCV_IOMMU_REG_IOHPMCTR(4));
+ATTR_RW_REG64(iohpmctr_6, RISCV_IOMMU_REG_IOHPMCTR(5));
+ATTR_RW_REG64(iohpmctr_7, RISCV_IOMMU_REG_IOHPMCTR(6));
+
+static struct attribute *riscv_iommu_attrs[] = {
+ &dev_attr_address.attr,
+ &dev_attr_reg_cap.attr,
+ &dev_attr_reg_fctl.attr,
+ &dev_attr_reg_cqh.attr,
+ &dev_attr_reg_cqt.attr,
+ &dev_attr_reg_cqcsr.attr,
+ &dev_attr_reg_fqh.attr,
+ &dev_attr_reg_fqt.attr,
+ &dev_attr_reg_fqcsr.attr,
+ &dev_attr_reg_pqh.attr,
+ &dev_attr_reg_pqt.attr,
+ &dev_attr_reg_pqcsr.attr,
+ &dev_attr_reg_ipsr.attr,
+ &dev_attr_reg_ivec.attr,
+ &dev_attr_reg_tr_iova.attr,
+ &dev_attr_reg_tr_ctrl.attr,
+ &dev_attr_reg_tr_response.attr,
+ &dev_attr_reg_iocntovf.attr,
+ &dev_attr_reg_iocntinh.attr,
+ &dev_attr_reg_iohpmcycles.attr,
+ &dev_attr_reg_iohpmctr_1.attr,
+ &dev_attr_reg_iohpmevt_1.attr,
+ &dev_attr_reg_iohpmctr_2.attr,
+ &dev_attr_reg_iohpmevt_2.attr,
+ &dev_attr_reg_iohpmctr_3.attr,
+ &dev_attr_reg_iohpmevt_3.attr,
+ &dev_attr_reg_iohpmctr_4.attr,
+ &dev_attr_reg_iohpmevt_4.attr,
+ &dev_attr_reg_iohpmctr_5.attr,
+ &dev_attr_reg_iohpmevt_5.attr,
+ &dev_attr_reg_iohpmctr_6.attr,
+ &dev_attr_reg_iohpmevt_6.attr,
+ &dev_attr_reg_iohpmctr_7.attr,
+ &dev_attr_reg_iohpmevt_7.attr,
+ NULL,
+};
+
+static struct attribute_group riscv_iommu_group = {
+ .name = "riscv-iommu",
+ .attrs = riscv_iommu_attrs,
+};
+
+const struct attribute_group *riscv_iommu_groups[] = {
+ &riscv_iommu_group,
+ NULL,
+};
+
+int riscv_iommu_sysfs_add(struct riscv_iommu_device *iommu) {
+ return iommu_device_sysfs_add(&iommu->iommu, NULL,
+ riscv_iommu_groups, "riscv-iommu@%llx", iommu->reg_phys);
+}
+
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 8c236242e2cc..31dc3c458e13 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -608,6 +608,7 @@ static const struct iommu_ops riscv_iommu_ops = {
void riscv_iommu_remove(struct riscv_iommu_device *iommu)
{
iommu_device_unregister(&iommu->iommu);
+ iommu_device_sysfs_remove(&iommu->iommu);
riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
}

@@ -646,6 +647,12 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
goto fail;
}

+ ret = riscv_iommu_sysfs_add(iommu);
+ if (ret) {
+ dev_err(dev, "cannot register sysfs interface (%d)\n", ret);
+ goto fail;
+ }
+
ret = iommu_device_register(&iommu->iommu, &riscv_iommu_ops, dev);
if (ret) {
dev_err(dev, "cannot register iommu interface (%d)\n", ret);
diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
index 7baefd3630b3..7dc9baa59a50 100644
--- a/drivers/iommu/riscv/iommu.h
+++ b/drivers/iommu/riscv/iommu.h
@@ -112,4 +112,6 @@ static inline void riscv_iommu_writeq(struct riscv_iommu_device *iommu,
int riscv_iommu_init(struct riscv_iommu_device *iommu);
void riscv_iommu_remove(struct riscv_iommu_device *iommu);

+int riscv_iommu_sysfs_add(struct riscv_iommu_device *iommu);
+
#endif
--
2.34.1


2023-07-19 20:05:29

by Tomasz Jeznach

[permalink] [raw]
Subject: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues

Enables message or wire signal interrupts for PCIe and platforms devices.

Co-developed-by: Nick Kossifidis <[email protected]>
Signed-off-by: Nick Kossifidis <[email protected]>
Signed-off-by: Tomasz Jeznach <[email protected]>
---
drivers/iommu/riscv/iommu-pci.c | 72 ++++
drivers/iommu/riscv/iommu-platform.c | 66 +++
drivers/iommu/riscv/iommu.c | 604 ++++++++++++++++++++++++++-
drivers/iommu/riscv/iommu.h | 28 ++
4 files changed, 769 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/riscv/iommu-pci.c b/drivers/iommu/riscv/iommu-pci.c
index c91f963d7a29..9ea0647f7b92 100644
--- a/drivers/iommu/riscv/iommu-pci.c
+++ b/drivers/iommu/riscv/iommu-pci.c
@@ -34,6 +34,7 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
{
struct device *dev = &pdev->dev;
struct riscv_iommu_device *iommu;
+ u64 icvec;
int ret;

ret = pci_enable_device_mem(pdev);
@@ -67,14 +68,84 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
iommu->dev = dev;
dev_set_drvdata(dev, iommu);

+ /* Check device reported capabilities. */
+ iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
+
+ /* The PCI driver only uses MSIs, make sure the IOMMU supports this */
+ switch (FIELD_GET(RISCV_IOMMU_CAP_IGS, iommu->cap)) {
+ case RISCV_IOMMU_CAP_IGS_MSI:
+ case RISCV_IOMMU_CAP_IGS_BOTH:
+ break;
+ default:
+ dev_err(dev, "unable to use message-signaled interrupts\n");
+ ret = -ENODEV;
+ goto fail;
+ }
+
dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
pci_set_master(pdev);

+ /* Allocate and assign IRQ vectors for the various events */
+ ret = pci_alloc_irq_vectors(pdev, 1, RISCV_IOMMU_INTR_COUNT, PCI_IRQ_MSIX);
+ if (ret < 0) {
+ dev_err(dev, "unable to allocate irq vectors\n");
+ goto fail;
+ }
+
+ ret = -ENODEV;
+
+ iommu->irq_cmdq = msi_get_virq(dev, RISCV_IOMMU_INTR_CQ);
+ if (!iommu->irq_cmdq) {
+ dev_warn(dev, "no MSI vector %d for the command queue\n",
+ RISCV_IOMMU_INTR_CQ);
+ goto fail;
+ }
+
+ iommu->irq_fltq = msi_get_virq(dev, RISCV_IOMMU_INTR_FQ);
+ if (!iommu->irq_fltq) {
+ dev_warn(dev, "no MSI vector %d for the fault/event queue\n",
+ RISCV_IOMMU_INTR_FQ);
+ goto fail;
+ }
+
+ if (iommu->cap & RISCV_IOMMU_CAP_HPM) {
+ iommu->irq_pm = msi_get_virq(dev, RISCV_IOMMU_INTR_PM);
+ if (!iommu->irq_pm) {
+ dev_warn(dev,
+ "no MSI vector %d for performance monitoring\n",
+ RISCV_IOMMU_INTR_PM);
+ goto fail;
+ }
+ }
+
+ if (iommu->cap & RISCV_IOMMU_CAP_ATS) {
+ iommu->irq_priq = msi_get_virq(dev, RISCV_IOMMU_INTR_PQ);
+ if (!iommu->irq_priq) {
+ dev_warn(dev,
+ "no MSI vector %d for page-request queue\n",
+ RISCV_IOMMU_INTR_PQ);
+ goto fail;
+ }
+ }
+
+ /* Set simple 1:1 mapping for MSI vectors */
+ icvec = FIELD_PREP(RISCV_IOMMU_IVEC_CIV, RISCV_IOMMU_INTR_CQ) |
+ FIELD_PREP(RISCV_IOMMU_IVEC_FIV, RISCV_IOMMU_INTR_FQ);
+
+ if (iommu->cap & RISCV_IOMMU_CAP_HPM)
+ icvec |= FIELD_PREP(RISCV_IOMMU_IVEC_PMIV, RISCV_IOMMU_INTR_PM);
+
+ if (iommu->cap & RISCV_IOMMU_CAP_ATS)
+ icvec |= FIELD_PREP(RISCV_IOMMU_IVEC_PIV, RISCV_IOMMU_INTR_PQ);
+
+ riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IVEC, icvec);
+
ret = riscv_iommu_init(iommu);
if (!ret)
return ret;

fail:
+ pci_free_irq_vectors(pdev);
pci_clear_master(pdev);
pci_release_regions(pdev);
pci_disable_device(pdev);
@@ -85,6 +156,7 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
static void riscv_iommu_pci_remove(struct pci_dev *pdev)
{
riscv_iommu_remove(dev_get_drvdata(&pdev->dev));
+ pci_free_irq_vectors(pdev);
pci_clear_master(pdev);
pci_release_regions(pdev);
pci_disable_device(pdev);
diff --git a/drivers/iommu/riscv/iommu-platform.c b/drivers/iommu/riscv/iommu-platform.c
index e4e8ca6711e7..35935d3c7ef4 100644
--- a/drivers/iommu/riscv/iommu-platform.c
+++ b/drivers/iommu/riscv/iommu-platform.c
@@ -20,6 +20,8 @@ static int riscv_iommu_platform_probe(struct platform_device *pdev)
struct device *dev = &pdev->dev;
struct riscv_iommu_device *iommu = NULL;
struct resource *res = NULL;
+ u32 fctl = 0;
+ int irq = 0;
int ret = 0;

iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
@@ -53,6 +55,70 @@ static int riscv_iommu_platform_probe(struct platform_device *pdev)
goto fail;
}

+ iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
+
+ /* For now we only support WSIs until we have AIA support */
+ ret = FIELD_GET(RISCV_IOMMU_CAP_IGS, iommu->cap);
+ if (ret == RISCV_IOMMU_CAP_IGS_MSI) {
+ dev_err(dev, "IOMMU only supports MSIs\n");
+ goto fail;
+ }
+
+ /* Parse IRQ assignment */
+ irq = platform_get_irq_byname_optional(pdev, "cmdq");
+ if (irq > 0)
+ iommu->irq_cmdq = irq;
+ else {
+ dev_err(dev, "no IRQ provided for the command queue\n");
+ goto fail;
+ }
+
+ irq = platform_get_irq_byname_optional(pdev, "fltq");
+ if (irq > 0)
+ iommu->irq_fltq = irq;
+ else {
+ dev_err(dev, "no IRQ provided for the fault/event queue\n");
+ goto fail;
+ }
+
+ if (iommu->cap & RISCV_IOMMU_CAP_HPM) {
+ irq = platform_get_irq_byname_optional(pdev, "pm");
+ if (irq > 0)
+ iommu->irq_pm = irq;
+ else {
+ dev_err(dev, "no IRQ provided for performance monitoring\n");
+ goto fail;
+ }
+ }
+
+ if (iommu->cap & RISCV_IOMMU_CAP_ATS) {
+ irq = platform_get_irq_byname_optional(pdev, "priq");
+ if (irq > 0)
+ iommu->irq_priq = irq;
+ else {
+ dev_err(dev, "no IRQ provided for the page-request queue\n");
+ goto fail;
+ }
+ }
+
+ /* Make sure fctl.WSI is set */
+ fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL);
+ fctl |= RISCV_IOMMU_FCTL_WSI;
+ riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, fctl);
+
+ /* Parse Queue lengts */
+ ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
+ if (!ret)
+ dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
+
+ ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
+ if (!ret)
+ dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
+
+ ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
+ if (!ret)
+ dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
+
dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));

return riscv_iommu_init(iommu);
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 31dc3c458e13..5c4cf9875302 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -45,6 +45,18 @@ static int ddt_mode = RISCV_IOMMU_DDTP_MODE_BARE;
module_param(ddt_mode, int, 0644);
MODULE_PARM_DESC(ddt_mode, "Device Directory Table mode.");

+static int cmdq_length = 1024;
+module_param(cmdq_length, int, 0644);
+MODULE_PARM_DESC(cmdq_length, "Command queue length.");
+
+static int fltq_length = 1024;
+module_param(fltq_length, int, 0644);
+MODULE_PARM_DESC(fltq_length, "Fault queue length.");
+
+static int priq_length = 1024;
+module_param(priq_length, int, 0644);
+MODULE_PARM_DESC(priq_length, "Page request interface queue length.");
+
/* IOMMU PSCID allocation namespace. */
#define RISCV_IOMMU_MAX_PSCID (1U << 20)
static DEFINE_IDA(riscv_iommu_pscids);
@@ -65,6 +77,497 @@ static DEFINE_IDA(riscv_iommu_pscids);
static const struct iommu_domain_ops riscv_iommu_domain_ops;
static const struct iommu_ops riscv_iommu_ops;

+/*
+ * Common queue management routines
+ */
+
+/* Note: offsets are the same for all queues */
+#define Q_HEAD(q) ((q)->qbr + (RISCV_IOMMU_REG_CQH - RISCV_IOMMU_REG_CQB))
+#define Q_TAIL(q) ((q)->qbr + (RISCV_IOMMU_REG_CQT - RISCV_IOMMU_REG_CQB))
+
+static unsigned riscv_iommu_queue_consume(struct riscv_iommu_device *iommu,
+ struct riscv_iommu_queue *q, unsigned *ready)
+{
+ u32 tail = riscv_iommu_readl(iommu, Q_TAIL(q));
+ *ready = q->lui;
+
+ BUG_ON(q->cnt <= tail);
+ if (q->lui <= tail)
+ return tail - q->lui;
+ return q->cnt - q->lui;
+}
+
+static void riscv_iommu_queue_release(struct riscv_iommu_device *iommu,
+ struct riscv_iommu_queue *q, unsigned count)
+{
+ q->lui = (q->lui + count) & (q->cnt - 1);
+ riscv_iommu_writel(iommu, Q_HEAD(q), q->lui);
+}
+
+static u32 riscv_iommu_queue_ctrl(struct riscv_iommu_device *iommu,
+ struct riscv_iommu_queue *q, u32 val)
+{
+ cycles_t end_cycles = RISCV_IOMMU_TIMEOUT + get_cycles();
+
+ riscv_iommu_writel(iommu, q->qcr, val);
+ do {
+ val = riscv_iommu_readl(iommu, q->qcr);
+ if (!(val & RISCV_IOMMU_QUEUE_BUSY))
+ break;
+ cpu_relax();
+ } while (get_cycles() < end_cycles);
+
+ return val;
+}
+
+static void riscv_iommu_queue_free(struct riscv_iommu_device *iommu,
+ struct riscv_iommu_queue *q)
+{
+ size_t size = q->len * q->cnt;
+
+ riscv_iommu_queue_ctrl(iommu, q, 0);
+
+ if (q->base) {
+ if (q->in_iomem)
+ iounmap(q->base);
+ else
+ dmam_free_coherent(iommu->dev, size, q->base, q->base_dma);
+ }
+ if (q->irq)
+ free_irq(q->irq, q);
+}
+
+static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data);
+static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data);
+static irqreturn_t riscv_iommu_fltq_irq_check(int irq, void *data);
+static irqreturn_t riscv_iommu_fltq_process(int irq, void *data);
+static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data);
+static irqreturn_t riscv_iommu_priq_process(int irq, void *data);
+
+static int riscv_iommu_queue_init(struct riscv_iommu_device *iommu, int queue_id)
+{
+ struct device *dev = iommu->dev;
+ struct riscv_iommu_queue *q = NULL;
+ size_t queue_size = 0;
+ irq_handler_t irq_check;
+ irq_handler_t irq_process;
+ const char *name;
+ int count = 0;
+ int irq = 0;
+ unsigned order = 0;
+ u64 qbr_val = 0;
+ u64 qbr_readback = 0;
+ u64 qbr_paddr = 0;
+ int ret = 0;
+
+ switch (queue_id) {
+ case RISCV_IOMMU_COMMAND_QUEUE:
+ q = &iommu->cmdq;
+ q->len = sizeof(struct riscv_iommu_command);
+ count = iommu->cmdq_len;
+ irq = iommu->irq_cmdq;
+ irq_check = riscv_iommu_cmdq_irq_check;
+ irq_process = riscv_iommu_cmdq_process;
+ q->qbr = RISCV_IOMMU_REG_CQB;
+ q->qcr = RISCV_IOMMU_REG_CQCSR;
+ name = "cmdq";
+ break;
+ case RISCV_IOMMU_FAULT_QUEUE:
+ q = &iommu->fltq;
+ q->len = sizeof(struct riscv_iommu_fq_record);
+ count = iommu->fltq_len;
+ irq = iommu->irq_fltq;
+ irq_check = riscv_iommu_fltq_irq_check;
+ irq_process = riscv_iommu_fltq_process;
+ q->qbr = RISCV_IOMMU_REG_FQB;
+ q->qcr = RISCV_IOMMU_REG_FQCSR;
+ name = "fltq";
+ break;
+ case RISCV_IOMMU_PAGE_REQUEST_QUEUE:
+ q = &iommu->priq;
+ q->len = sizeof(struct riscv_iommu_pq_record);
+ count = iommu->priq_len;
+ irq = iommu->irq_priq;
+ irq_check = riscv_iommu_priq_irq_check;
+ irq_process = riscv_iommu_priq_process;
+ q->qbr = RISCV_IOMMU_REG_PQB;
+ q->qcr = RISCV_IOMMU_REG_PQCSR;
+ name = "priq";
+ break;
+ default:
+ dev_err(dev, "invalid queue interrupt index in queue_init!\n");
+ return -EINVAL;
+ }
+
+ /* Polling not implemented */
+ if (!irq)
+ return -ENODEV;
+
+ /* Allocate queue in memory and set the base register */
+ order = ilog2(count);
+ do {
+ queue_size = q->len * (1ULL << order);
+ q->base = dmam_alloc_coherent(dev, queue_size, &q->base_dma, GFP_KERNEL);
+ if (q->base || queue_size < PAGE_SIZE)
+ break;
+
+ order--;
+ } while (1);
+
+ if (!q->base) {
+ dev_err(dev, "failed to allocate %s queue (cnt: %u)\n", name, count);
+ return -ENOMEM;
+ }
+
+ q->cnt = 1ULL << order;
+
+ qbr_val = phys_to_ppn(q->base_dma) |
+ FIELD_PREP(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, order - 1);
+
+ riscv_iommu_writeq(iommu, q->qbr, qbr_val);
+
+ /*
+ * Queue base registers are WARL, so it's possible that whatever we wrote
+ * there was illegal/not supported by the hw in which case we need to make
+ * sure we set a supported PPN and/or queue size.
+ */
+ qbr_readback = riscv_iommu_readq(iommu, q->qbr);
+ if (qbr_readback == qbr_val)
+ goto irq;
+
+ dmam_free_coherent(dev, queue_size, q->base, q->base_dma);
+
+ /* Get supported queue size */
+ order = FIELD_GET(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, qbr_readback) + 1;
+ q->cnt = 1ULL << order;
+ queue_size = q->len * q->cnt;
+
+ /*
+ * In case we also failed to set PPN, it means the field is hardcoded and the
+ * queue resides in I/O memory instead, so get its physical address and
+ * ioremap it.
+ */
+ qbr_paddr = ppn_to_phys(qbr_readback);
+ if (qbr_paddr != q->base_dma) {
+ dev_info(dev,
+ "hardcoded ppn in %s base register, using io memory for the queue\n",
+ name);
+ dev_info(dev, "queue length for %s set to %i\n", name, q->cnt);
+ q->in_iomem = true;
+ q->base = ioremap(qbr_paddr, queue_size);
+ if (!q->base) {
+ dev_err(dev, "failed to map %s queue (cnt: %u)\n", name, q->cnt);
+ return -ENOMEM;
+ }
+ q->base_dma = qbr_paddr;
+ } else {
+ /*
+ * We only failed to set the queue size, re-try to allocate memory with
+ * the queue size supported by the hw.
+ */
+ dev_info(dev, "hardcoded queue size in %s base register\n", name);
+ dev_info(dev, "retrying with queue length: %i\n", q->cnt);
+ q->base = dmam_alloc_coherent(dev, queue_size, &q->base_dma, GFP_KERNEL);
+ if (!q->base) {
+ dev_err(dev, "failed to allocate %s queue (cnt: %u)\n",
+ name, q->cnt);
+ return -ENOMEM;
+ }
+ }
+
+ qbr_val = phys_to_ppn(q->base_dma) |
+ FIELD_PREP(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, order - 1);
+ riscv_iommu_writeq(iommu, q->qbr, qbr_val);
+
+ /* Final check to make sure hw accepted our write */
+ qbr_readback = riscv_iommu_readq(iommu, q->qbr);
+ if (qbr_readback != qbr_val) {
+ dev_err(dev, "failed to set base register for %s\n", name);
+ goto fail;
+ }
+
+ irq:
+ if (request_threaded_irq(irq, irq_check, irq_process, IRQF_ONESHOT | IRQF_SHARED,
+ dev_name(dev), q)) {
+ dev_err(dev, "fail to request irq %d for %s\n", irq, name);
+ goto fail;
+ }
+
+ q->irq = irq;
+
+ /* Note: All RIO_xQ_EN/IE fields are in the same offsets */
+ ret =
+ riscv_iommu_queue_ctrl(iommu, q,
+ RISCV_IOMMU_QUEUE_ENABLE |
+ RISCV_IOMMU_QUEUE_INTR_ENABLE);
+ if (ret & RISCV_IOMMU_QUEUE_BUSY) {
+ dev_err(dev, "%s init timeout\n", name);
+ ret = -EBUSY;
+ goto fail;
+ }
+
+ return 0;
+
+ fail:
+ riscv_iommu_queue_free(iommu, q);
+ return 0;
+}
+
+/*
+ * I/O MMU Command queue chapter 3.1
+ */
+
+static inline void riscv_iommu_cmd_inval_vma(struct riscv_iommu_command *cmd)
+{
+ cmd->dword0 =
+ FIELD_PREP(RISCV_IOMMU_CMD_OPCODE,
+ RISCV_IOMMU_CMD_IOTINVAL_OPCODE) | FIELD_PREP(RISCV_IOMMU_CMD_FUNC,
+ RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA);
+ cmd->dword1 = 0;
+}
+
+static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
+ u64 addr)
+{
+ cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
+ cmd->dword1 = addr;
+}
+
+static inline void riscv_iommu_cmd_inval_set_pscid(struct riscv_iommu_command *cmd,
+ unsigned pscid)
+{
+ cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_PSCID, pscid) |
+ RISCV_IOMMU_CMD_IOTINVAL_PSCV;
+}
+
+static inline void riscv_iommu_cmd_inval_set_gscid(struct riscv_iommu_command *cmd,
+ unsigned gscid)
+{
+ cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_GSCID, gscid) |
+ RISCV_IOMMU_CMD_IOTINVAL_GV;
+}
+
+static inline void riscv_iommu_cmd_iofence(struct riscv_iommu_command *cmd)
+{
+ cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOFENCE_OPCODE) |
+ FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOFENCE_FUNC_C);
+ cmd->dword1 = 0;
+}
+
+static inline void riscv_iommu_cmd_iofence_set_av(struct riscv_iommu_command *cmd,
+ u64 addr, u32 data)
+{
+ cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOFENCE_OPCODE) |
+ FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOFENCE_FUNC_C) |
+ FIELD_PREP(RISCV_IOMMU_CMD_IOFENCE_DATA, data) | RISCV_IOMMU_CMD_IOFENCE_AV;
+ cmd->dword1 = (addr >> 2);
+}
+
+static inline void riscv_iommu_cmd_iodir_inval_ddt(struct riscv_iommu_command *cmd)
+{
+ cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IODIR_OPCODE) |
+ FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT);
+ cmd->dword1 = 0;
+}
+
+static inline void riscv_iommu_cmd_iodir_inval_pdt(struct riscv_iommu_command *cmd)
+{
+ cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IODIR_OPCODE) |
+ FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT);
+ cmd->dword1 = 0;
+}
+
+static inline void riscv_iommu_cmd_iodir_set_did(struct riscv_iommu_command *cmd,
+ unsigned devid)
+{
+ cmd->dword0 |=
+ FIELD_PREP(RISCV_IOMMU_CMD_IODIR_DID, devid) | RISCV_IOMMU_CMD_IODIR_DV;
+}
+
+/* TODO: Convert into lock-less MPSC implementation. */
+static bool riscv_iommu_post_sync(struct riscv_iommu_device *iommu,
+ struct riscv_iommu_command *cmd, bool sync)
+{
+ u32 head, tail, next, last;
+ unsigned long flags;
+
+ spin_lock_irqsave(&iommu->cq_lock, flags);
+ head = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQH) & (iommu->cmdq.cnt - 1);
+ tail = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQT) & (iommu->cmdq.cnt - 1);
+ last = iommu->cmdq.lui;
+ if (tail != last) {
+ spin_unlock_irqrestore(&iommu->cq_lock, flags);
+ /*
+ * FIXME: This is a workaround for dropped MMIO writes/reads on QEMU platform.
+ * While debugging of the problem is still ongoing, this provides
+ * a simple impolementation of try-again policy.
+ * Will be changed to lock-less algorithm in the feature.
+ */
+ dev_dbg(iommu->dev, "IOMMU CQT: %x != %x (1st)\n", last, tail);
+ spin_lock_irqsave(&iommu->cq_lock, flags);
+ tail =
+ riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQT) & (iommu->cmdq.cnt - 1);
+ last = iommu->cmdq.lui;
+ if (tail != last) {
+ spin_unlock_irqrestore(&iommu->cq_lock, flags);
+ dev_dbg(iommu->dev, "IOMMU CQT: %x != %x (2nd)\n", last, tail);
+ spin_lock_irqsave(&iommu->cq_lock, flags);
+ }
+ }
+
+ next = (last + 1) & (iommu->cmdq.cnt - 1);
+ if (next != head) {
+ struct riscv_iommu_command *ptr = iommu->cmdq.base;
+ ptr[last] = *cmd;
+ wmb();
+ riscv_iommu_writel(iommu, RISCV_IOMMU_REG_CQT, next);
+ iommu->cmdq.lui = next;
+ }
+
+ spin_unlock_irqrestore(&iommu->cq_lock, flags);
+
+ if (sync && head != next) {
+ cycles_t start_time = get_cycles();
+ while (1) {
+ last = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQH) &
+ (iommu->cmdq.cnt - 1);
+ if (head < next && last >= next)
+ break;
+ if (head > next && last < head && last >= next)
+ break;
+ if (RISCV_IOMMU_TIMEOUT < (get_cycles() - start_time)) {
+ dev_err(iommu->dev, "IOFENCE TIMEOUT\n");
+ return false;
+ }
+ cpu_relax();
+ }
+ }
+
+ return next != head;
+}
+
+static bool riscv_iommu_post(struct riscv_iommu_device *iommu,
+ struct riscv_iommu_command *cmd)
+{
+ return riscv_iommu_post_sync(iommu, cmd, false);
+}
+
+static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
+{
+ struct riscv_iommu_command cmd;
+ riscv_iommu_cmd_iofence(&cmd);
+ return riscv_iommu_post_sync(iommu, &cmd, true);
+}
+
+/* Command queue primary interrupt handler */
+static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data)
+{
+ struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
+ struct riscv_iommu_device *iommu =
+ container_of(q, struct riscv_iommu_device, cmdq);
+ u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
+ if (ipsr & RISCV_IOMMU_IPSR_CIP)
+ return IRQ_WAKE_THREAD;
+ return IRQ_NONE;
+}
+
+/* Command queue interrupt hanlder thread function */
+static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data)
+{
+ struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
+ struct riscv_iommu_device *iommu;
+ unsigned ctrl;
+
+ iommu = container_of(q, struct riscv_iommu_device, cmdq);
+
+ /* Error reporting, clear error reports if any. */
+ ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQCSR);
+ if (ctrl & (RISCV_IOMMU_CQCSR_CQMF |
+ RISCV_IOMMU_CQCSR_CMD_TO | RISCV_IOMMU_CQCSR_CMD_ILL)) {
+ riscv_iommu_queue_ctrl(iommu, &iommu->cmdq, ctrl);
+ dev_warn_ratelimited(iommu->dev,
+ "Command queue error: fault: %d tout: %d err: %d\n",
+ !!(ctrl & RISCV_IOMMU_CQCSR_CQMF),
+ !!(ctrl & RISCV_IOMMU_CQCSR_CMD_TO),
+ !!(ctrl & RISCV_IOMMU_CQCSR_CMD_ILL));
+ }
+
+ /* Clear fault interrupt pending. */
+ riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_CIP);
+
+ return IRQ_HANDLED;
+}
+
+/*
+ * Fault/event queue, chapter 3.2
+ */
+
+static void riscv_iommu_fault_report(struct riscv_iommu_device *iommu,
+ struct riscv_iommu_fq_record *event)
+{
+ unsigned err, devid;
+
+ err = FIELD_GET(RISCV_IOMMU_FQ_HDR_CAUSE, event->hdr);
+ devid = FIELD_GET(RISCV_IOMMU_FQ_HDR_DID, event->hdr);
+
+ dev_warn_ratelimited(iommu->dev,
+ "Fault %d devid: %d" " iotval: %llx iotval2: %llx\n", err,
+ devid, event->iotval, event->iotval2);
+}
+
+/* Fault/event queue primary interrupt handler */
+static irqreturn_t riscv_iommu_fltq_irq_check(int irq, void *data)
+{
+ struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
+ struct riscv_iommu_device *iommu =
+ container_of(q, struct riscv_iommu_device, fltq);
+ u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
+ if (ipsr & RISCV_IOMMU_IPSR_FIP)
+ return IRQ_WAKE_THREAD;
+ return IRQ_NONE;
+}
+
+/* Fault queue interrupt hanlder thread function */
+static irqreturn_t riscv_iommu_fltq_process(int irq, void *data)
+{
+ struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
+ struct riscv_iommu_device *iommu;
+ struct riscv_iommu_fq_record *events;
+ unsigned cnt, len, idx, ctrl;
+
+ iommu = container_of(q, struct riscv_iommu_device, fltq);
+ events = (struct riscv_iommu_fq_record *)q->base;
+
+ /* Error reporting, clear error reports if any. */
+ ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FQCSR);
+ if (ctrl & (RISCV_IOMMU_FQCSR_FQMF | RISCV_IOMMU_FQCSR_FQOF)) {
+ riscv_iommu_queue_ctrl(iommu, &iommu->fltq, ctrl);
+ dev_warn_ratelimited(iommu->dev,
+ "Fault queue error: fault: %d full: %d\n",
+ !!(ctrl & RISCV_IOMMU_FQCSR_FQMF),
+ !!(ctrl & RISCV_IOMMU_FQCSR_FQOF));
+ }
+
+ /* Clear fault interrupt pending. */
+ riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_FIP);
+
+ /* Report fault events. */
+ do {
+ cnt = riscv_iommu_queue_consume(iommu, q, &idx);
+ if (!cnt)
+ break;
+ for (len = 0; len < cnt; idx++, len++)
+ riscv_iommu_fault_report(iommu, &events[idx]);
+ riscv_iommu_queue_release(iommu, q, cnt);
+ } while (1);
+
+ return IRQ_HANDLED;
+}
+
+/*
+ * Page request queue, chapter 3.3
+ */
+
/*
* Register device for IOMMU tracking.
*/
@@ -97,6 +600,54 @@ static void riscv_iommu_add_device(struct riscv_iommu_device *iommu, struct devi
mutex_unlock(&iommu->eps_mutex);
}

+/* Page request interface queue primary interrupt handler */
+static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data)
+{
+ struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
+ struct riscv_iommu_device *iommu =
+ container_of(q, struct riscv_iommu_device, priq);
+ u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
+ if (ipsr & RISCV_IOMMU_IPSR_PIP)
+ return IRQ_WAKE_THREAD;
+ return IRQ_NONE;
+}
+
+/* Page request interface queue interrupt hanlder thread function */
+static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
+{
+ struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
+ struct riscv_iommu_device *iommu;
+ struct riscv_iommu_pq_record *requests;
+ unsigned cnt, idx, ctrl;
+
+ iommu = container_of(q, struct riscv_iommu_device, priq);
+ requests = (struct riscv_iommu_pq_record *)q->base;
+
+ /* Error reporting, clear error reports if any. */
+ ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_PQCSR);
+ if (ctrl & (RISCV_IOMMU_PQCSR_PQMF | RISCV_IOMMU_PQCSR_PQOF)) {
+ riscv_iommu_queue_ctrl(iommu, &iommu->priq, ctrl);
+ dev_warn_ratelimited(iommu->dev,
+ "Page request queue error: fault: %d full: %d\n",
+ !!(ctrl & RISCV_IOMMU_PQCSR_PQMF),
+ !!(ctrl & RISCV_IOMMU_PQCSR_PQOF));
+ }
+
+ /* Clear page request interrupt pending. */
+ riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_PIP);
+
+ /* Process page requests. */
+ do {
+ cnt = riscv_iommu_queue_consume(iommu, q, &idx);
+ if (!cnt)
+ break;
+ dev_warn(iommu->dev, "unexpected %u page requests\n", cnt);
+ riscv_iommu_queue_release(iommu, q, cnt);
+ } while (1);
+
+ return IRQ_HANDLED;
+}
+
/*
* Endpoint management
*/
@@ -350,7 +901,29 @@ static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
unsigned long *start, unsigned long *end,
size_t *pgsize)
{
- /* Command interface not implemented */
+ struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
+ struct riscv_iommu_command cmd;
+ unsigned long iova;
+
+ if (domain->mode == RISCV_IOMMU_DC_FSC_MODE_BARE)
+ return;
+
+ /* Domain not attached to an IOMMU! */
+ BUG_ON(!domain->iommu);
+
+ riscv_iommu_cmd_inval_vma(&cmd);
+ riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
+
+ if (start && end && pgsize) {
+ /* Cover only the range that is needed */
+ for (iova = *start; iova <= *end; iova += *pgsize) {
+ riscv_iommu_cmd_inval_set_addr(&cmd, iova);
+ riscv_iommu_post(domain->iommu, &cmd);
+ }
+ } else {
+ riscv_iommu_post(domain->iommu, &cmd);
+ }
+ riscv_iommu_iofence_sync(domain->iommu);
}

static void riscv_iommu_flush_iotlb_all(struct iommu_domain *iommu_domain)
@@ -610,6 +1183,9 @@ void riscv_iommu_remove(struct riscv_iommu_device *iommu)
iommu_device_unregister(&iommu->iommu);
iommu_device_sysfs_remove(&iommu->iommu);
riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
+ riscv_iommu_queue_free(iommu, &iommu->cmdq);
+ riscv_iommu_queue_free(iommu, &iommu->fltq);
+ riscv_iommu_queue_free(iommu, &iommu->priq);
}

int riscv_iommu_init(struct riscv_iommu_device *iommu)
@@ -632,6 +1208,16 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
}
#endif

+ /*
+ * Assign queue lengths from module parameters if not already
+ * set on the device tree.
+ */
+ if (!iommu->cmdq_len)
+ iommu->cmdq_len = cmdq_length;
+ if (!iommu->fltq_len)
+ iommu->fltq_len = fltq_length;
+ if (!iommu->priq_len)
+ iommu->priq_len = priq_length;
/* Clear any pending interrupt flag. */
riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR,
RISCV_IOMMU_IPSR_CIP |
@@ -639,7 +1225,20 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
RISCV_IOMMU_IPSR_PMIP | RISCV_IOMMU_IPSR_PIP);
spin_lock_init(&iommu->cq_lock);
mutex_init(&iommu->eps_mutex);
+ ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_COMMAND_QUEUE);
+ if (ret)
+ goto fail;
+ ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_FAULT_QUEUE);
+ if (ret)
+ goto fail;
+ if (!(iommu->cap & RISCV_IOMMU_CAP_ATS))
+ goto no_ats;
+
+ ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_PAGE_REQUEST_QUEUE);
+ if (ret)
+ goto fail;

+ no_ats:
ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);

if (ret) {
@@ -663,5 +1262,8 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
return 0;
fail:
riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
+ riscv_iommu_queue_free(iommu, &iommu->priq);
+ riscv_iommu_queue_free(iommu, &iommu->fltq);
+ riscv_iommu_queue_free(iommu, &iommu->cmdq);
return ret;
}
diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
index 7dc9baa59a50..04148a2a8ffd 100644
--- a/drivers/iommu/riscv/iommu.h
+++ b/drivers/iommu/riscv/iommu.h
@@ -28,6 +28,24 @@
#define IOMMU_PAGE_SIZE_1G BIT_ULL(30)
#define IOMMU_PAGE_SIZE_512G BIT_ULL(39)

+struct riscv_iommu_queue {
+ dma_addr_t base_dma; /* ring buffer bus address */
+ void *base; /* ring buffer pointer */
+ size_t len; /* single item length */
+ u32 cnt; /* items count */
+ u32 lui; /* last used index, consumer/producer share */
+ unsigned qbr; /* queue base register offset */
+ unsigned qcr; /* queue control and status register offset */
+ int irq; /* registered interrupt number */
+ bool in_iomem; /* indicates queue data are in I/O memory */
+};
+
+enum riscv_queue_ids {
+ RISCV_IOMMU_COMMAND_QUEUE = 0,
+ RISCV_IOMMU_FAULT_QUEUE = 1,
+ RISCV_IOMMU_PAGE_REQUEST_QUEUE = 2
+};
+
struct riscv_iommu_device {
struct iommu_device iommu; /* iommu core interface */
struct device *dev; /* iommu hardware */
@@ -42,6 +60,11 @@ struct riscv_iommu_device {
int irq_pm;
int irq_priq;

+ /* Queue lengths */
+ int cmdq_len;
+ int fltq_len;
+ int priq_len;
+
/* supported and enabled hardware capabilities */
u64 cap;

@@ -53,6 +76,11 @@ struct riscv_iommu_device {
unsigned ddt_mode;
bool ddtp_in_iomem;

+ /* hardware queues */
+ struct riscv_iommu_queue cmdq;
+ struct riscv_iommu_queue fltq;
+ struct riscv_iommu_queue priq;
+
/* Connected end-points */
struct rb_root eps;
struct mutex eps_mutex;
--
2.34.1


2023-07-19 20:08:09

by Tomasz Jeznach

[permalink] [raw]
Subject: [PATCH 10/11] RISC-V: drivers/iommu/riscv: Add MSI identity remapping

This change provides basic identity mapping support to
excercise MSI_FLAT hardware capability.

Signed-off-by: Tomasz Jeznach <[email protected]>
---
drivers/iommu/riscv/iommu.c | 81 +++++++++++++++++++++++++++++++++++++
drivers/iommu/riscv/iommu.h | 3 ++
2 files changed, 84 insertions(+)

diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 6042c35be3ca..7b3e3e135cf6 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -61,6 +61,9 @@ MODULE_PARM_DESC(priq_length, "Page request interface queue length.");
#define RISCV_IOMMU_MAX_PSCID (1U << 20)
static DEFINE_IDA(riscv_iommu_pscids);

+/* TODO: Enable MSI remapping */
+#define RISCV_IMSIC_BASE 0x28000000
+
/* 1 second */
#define RISCV_IOMMU_TIMEOUT riscv_timebase

@@ -932,6 +935,72 @@ static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
* Endpoint management
*/

+static int riscv_iommu_enable_ir(struct riscv_iommu_endpoint *ep)
+{
+ struct riscv_iommu_device *iommu = ep->iommu;
+ struct iommu_resv_region *entry;
+ struct irq_domain *msi_domain;
+ u64 val;
+ int i;
+
+ /* Initialize MSI remapping */
+ if (!ep->dc || !(iommu->cap & RISCV_IOMMU_CAP_MSI_FLAT))
+ return 0;
+
+ ep->msi_root = (struct riscv_iommu_msi_pte *)get_zeroed_page(GFP_KERNEL);
+ if (!ep->msi_root)
+ return -ENOMEM;
+
+ for (i = 0; i < 256; i++) {
+ ep->msi_root[i].pte = RISCV_IOMMU_MSI_PTE_V |
+ FIELD_PREP(RISCV_IOMMU_MSI_PTE_M, 3) |
+ phys_to_ppn(RISCV_IMSIC_BASE + i * PAGE_SIZE);
+ }
+
+ entry = iommu_alloc_resv_region(RISCV_IMSIC_BASE, PAGE_SIZE * 256, 0,
+ IOMMU_RESV_SW_MSI, GFP_KERNEL);
+ if (entry)
+ list_add_tail(&entry->list, &ep->regions);
+
+ val = virt_to_pfn(ep->msi_root) |
+ FIELD_PREP(RISCV_IOMMU_DC_MSIPTP_MODE, RISCV_IOMMU_DC_MSIPTP_MODE_FLAT);
+ ep->dc->msiptp = cpu_to_le64(val);
+
+ /* Single page of MSIPTP, 256 IMSIC files */
+ ep->dc->msi_addr_mask = cpu_to_le64(255);
+ ep->dc->msi_addr_pattern = cpu_to_le64(RISCV_IMSIC_BASE >> 12);
+ wmb();
+
+ /* set msi domain for the device as isolated. hack. */
+ msi_domain = dev_get_msi_domain(ep->dev);
+ if (msi_domain) {
+ msi_domain->flags |= IRQ_DOMAIN_FLAG_ISOLATED_MSI;
+ }
+
+ dev_dbg(ep->dev, "RV-IR enabled\n");
+
+ ep->ir_enabled = true;
+
+ return 0;
+}
+
+static void riscv_iommu_disable_ir(struct riscv_iommu_endpoint *ep)
+{
+ if (!ep->ir_enabled)
+ return;
+
+ ep->dc->msi_addr_pattern = 0ULL;
+ ep->dc->msi_addr_mask = 0ULL;
+ ep->dc->msiptp = 0ULL;
+ wmb();
+
+ dev_dbg(ep->dev, "RV-IR disabled\n");
+
+ free_pages((unsigned long)ep->msi_root, 0);
+ ep->msi_root = NULL;
+ ep->ir_enabled = false;
+}
+
/* Endpoint features/capabilities */
static void riscv_iommu_disable_ep(struct riscv_iommu_endpoint *ep)
{
@@ -1226,6 +1295,7 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)

mutex_init(&ep->lock);
INIT_LIST_HEAD(&ep->domain);
+ INIT_LIST_HEAD(&ep->regions);

if (dev_is_pci(dev)) {
ep->devid = pci_dev_id(to_pci_dev(dev));
@@ -1248,6 +1318,7 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
dev_iommu_priv_set(dev, ep);
riscv_iommu_add_device(iommu, dev);
riscv_iommu_enable_ep(ep);
+ riscv_iommu_enable_ir(ep);

return &iommu->iommu;
}
@@ -1279,6 +1350,7 @@ static void riscv_iommu_release_device(struct device *dev)
riscv_iommu_iodir_inv_devid(iommu, ep->devid);
}

+ riscv_iommu_disable_ir(ep);
riscv_iommu_disable_ep(ep);

/* Remove endpoint from IOMMU tracking structures */
@@ -1301,6 +1373,15 @@ static struct iommu_group *riscv_iommu_device_group(struct device *dev)

static void riscv_iommu_get_resv_regions(struct device *dev, struct list_head *head)
{
+ struct iommu_resv_region *entry, *new_entry;
+ struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
+
+ list_for_each_entry(entry, &ep->regions, list) {
+ new_entry = kmemdup(entry, sizeof(*entry), GFP_KERNEL);
+ if (new_entry)
+ list_add_tail(&new_entry->list, head);
+ }
+
iommu_dma_get_resv_regions(dev, head);
}

diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
index 83e8d00fd0f8..55418a1144fb 100644
--- a/drivers/iommu/riscv/iommu.h
+++ b/drivers/iommu/riscv/iommu.h
@@ -117,14 +117,17 @@ struct riscv_iommu_endpoint {
struct riscv_iommu_dc *dc; /* device context pointer */
struct riscv_iommu_pc *pc; /* process context root, valid if pasid_enabled is true */
struct riscv_iommu_device *iommu; /* parent iommu device */
+ struct riscv_iommu_msi_pte *msi_root; /* interrupt re-mapping */

struct mutex lock;
struct list_head domain; /* endpoint attached managed domain */
+ struct list_head regions; /* reserved regions, interrupt remapping window */

/* end point info bits */
unsigned pasid_bits;
unsigned pasid_feat;
bool pasid_enabled;
+ bool ir_enabled;
};

/* Helper functions and macros */
--
2.34.1


2023-07-19 20:13:42

by Tomasz Jeznach

[permalink] [raw]
Subject: [PATCH 09/11] RISC-V: drivers/iommu/riscv: Add SVA with PASID/ATS/PRI support.

Introduces SVA (Shared Virtual Address) for RISC-V IOMMU, with
ATS/PRI services for capable devices.

Co-developed-by: Sebastien Boeuf <[email protected]>
Signed-off-by: Sebastien Boeuf <[email protected]>
Signed-off-by: Tomasz Jeznach <[email protected]>
---
drivers/iommu/riscv/iommu.c | 601 +++++++++++++++++++++++++++++++++++-
drivers/iommu/riscv/iommu.h | 14 +
2 files changed, 610 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 2ef6952a2109..6042c35be3ca 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -384,6 +384,89 @@ static inline void riscv_iommu_cmd_iodir_set_did(struct riscv_iommu_command *cmd
FIELD_PREP(RISCV_IOMMU_CMD_IODIR_DID, devid) | RISCV_IOMMU_CMD_IODIR_DV;
}

+static inline void riscv_iommu_cmd_iodir_set_pid(struct riscv_iommu_command *cmd,
+ unsigned pasid)
+{
+ cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IODIR_PID, pasid);
+}
+
+static void riscv_iommu_cmd_ats_inval(struct riscv_iommu_command *cmd)
+{
+ cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_ATS_OPCODE) |
+ FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_ATS_FUNC_INVAL);
+ cmd->dword1 = 0;
+}
+
+static inline void riscv_iommu_cmd_ats_prgr(struct riscv_iommu_command *cmd)
+{
+ cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_ATS_OPCODE) |
+ FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_ATS_FUNC_PRGR);
+ cmd->dword1 = 0;
+}
+
+static void riscv_iommu_cmd_ats_set_rid(struct riscv_iommu_command *cmd, u32 rid)
+{
+ cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_ATS_RID, rid);
+}
+
+static void riscv_iommu_cmd_ats_set_pid(struct riscv_iommu_command *cmd, u32 pid)
+{
+ cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_ATS_PID, pid) | RISCV_IOMMU_CMD_ATS_PV;
+}
+
+static void riscv_iommu_cmd_ats_set_dseg(struct riscv_iommu_command *cmd, u8 seg)
+{
+ cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_ATS_DSEG, seg) | RISCV_IOMMU_CMD_ATS_DSV;
+}
+
+static void riscv_iommu_cmd_ats_set_payload(struct riscv_iommu_command *cmd, u64 payload)
+{
+ cmd->dword1 = payload;
+}
+
+/* Prepare the ATS invalidation payload */
+static unsigned long riscv_iommu_ats_inval_payload(unsigned long start,
+ unsigned long end, bool global_inv)
+{
+ size_t len = end - start + 1;
+ unsigned long payload = 0;
+
+ /*
+ * PCI Express specification
+ * Section 10.2.3.2 Translation Range Size (S) Field
+ */
+ if (len < PAGE_SIZE)
+ len = PAGE_SIZE;
+ else
+ len = __roundup_pow_of_two(len);
+
+ payload = (start & ~(len - 1)) | (((len - 1) >> 12) << 11);
+
+ if (global_inv)
+ payload |= RISCV_IOMMU_CMD_ATS_INVAL_G;
+
+ return payload;
+}
+
+/* Prepare the ATS invalidation payload for all translations to be invalidated. */
+static unsigned long riscv_iommu_ats_inval_all_payload(bool global_inv)
+{
+ unsigned long payload = GENMASK_ULL(62, 11);
+
+ if (global_inv)
+ payload |= RISCV_IOMMU_CMD_ATS_INVAL_G;
+
+ return payload;
+}
+
+/* Prepare the ATS "Page Request Group Response" payload */
+static unsigned long riscv_iommu_ats_prgr_payload(u16 dest_id, u8 resp_code, u16 grp_idx)
+{
+ return FIELD_PREP(RISCV_IOMMU_CMD_ATS_PRGR_DST_ID, dest_id) |
+ FIELD_PREP(RISCV_IOMMU_CMD_ATS_PRGR_RESP_CODE, resp_code) |
+ FIELD_PREP(RISCV_IOMMU_CMD_ATS_PRGR_PRG_INDEX, grp_idx);
+}
+
/* TODO: Convert into lock-less MPSC implementation. */
static bool riscv_iommu_post_sync(struct riscv_iommu_device *iommu,
struct riscv_iommu_command *cmd, bool sync)
@@ -460,6 +543,16 @@ static bool riscv_iommu_iodir_inv_devid(struct riscv_iommu_device *iommu, unsign
return riscv_iommu_post(iommu, &cmd);
}

+static bool riscv_iommu_iodir_inv_pasid(struct riscv_iommu_device *iommu,
+ unsigned devid, unsigned pasid)
+{
+ struct riscv_iommu_command cmd;
+ riscv_iommu_cmd_iodir_inval_pdt(&cmd);
+ riscv_iommu_cmd_iodir_set_did(&cmd, devid);
+ riscv_iommu_cmd_iodir_set_pid(&cmd, pasid);
+ return riscv_iommu_post(iommu, &cmd);
+}
+
static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
{
struct riscv_iommu_command cmd;
@@ -467,6 +560,62 @@ static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
return riscv_iommu_post_sync(iommu, &cmd, true);
}

+static void riscv_iommu_mm_invalidate(struct mmu_notifier *mn,
+ struct mm_struct *mm, unsigned long start,
+ unsigned long end)
+{
+ struct riscv_iommu_command cmd;
+ struct riscv_iommu_endpoint *endpoint;
+ struct riscv_iommu_domain *domain =
+ container_of(mn, struct riscv_iommu_domain, mn);
+ unsigned long iova;
+ /*
+ * The mm_types defines vm_end as the first byte after the end address,
+ * different from IOMMU subsystem using the last address of an address
+ * range. So do a simple translation here by updating what end means.
+ */
+ unsigned long payload = riscv_iommu_ats_inval_payload(start, end - 1, true);
+
+ riscv_iommu_cmd_inval_vma(&cmd);
+ riscv_iommu_cmd_inval_set_gscid(&cmd, 0);
+ riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
+ if (end > start) {
+ /* Cover only the range that is needed */
+ for (iova = start; iova < end; iova += PAGE_SIZE) {
+ riscv_iommu_cmd_inval_set_addr(&cmd, iova);
+ riscv_iommu_post(domain->iommu, &cmd);
+ }
+ } else {
+ riscv_iommu_post(domain->iommu, &cmd);
+ }
+
+ riscv_iommu_iofence_sync(domain->iommu);
+
+ /* ATS invalidation for every device and for specific translation range. */
+ list_for_each_entry(endpoint, &domain->endpoints, domain) {
+ if (!endpoint->pasid_enabled)
+ continue;
+
+ riscv_iommu_cmd_ats_inval(&cmd);
+ riscv_iommu_cmd_ats_set_dseg(&cmd, endpoint->domid);
+ riscv_iommu_cmd_ats_set_rid(&cmd, endpoint->devid);
+ riscv_iommu_cmd_ats_set_pid(&cmd, domain->pasid);
+ riscv_iommu_cmd_ats_set_payload(&cmd, payload);
+ riscv_iommu_post(domain->iommu, &cmd);
+ }
+ riscv_iommu_iofence_sync(domain->iommu);
+}
+
+static void riscv_iommu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+ /* TODO: removed from notifier, cleanup PSCID mapping, flush IOTLB */
+}
+
+static const struct mmu_notifier_ops riscv_iommu_mmuops = {
+ .release = riscv_iommu_mm_release,
+ .invalidate_range = riscv_iommu_mm_invalidate,
+};
+
/* Command queue primary interrupt handler */
static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data)
{
@@ -608,6 +757,128 @@ static void riscv_iommu_add_device(struct riscv_iommu_device *iommu, struct devi
mutex_unlock(&iommu->eps_mutex);
}

+/*
+ * Get device reference based on device identifier (requester id).
+ * Decrement reference count with put_device() call.
+ */
+static struct device *riscv_iommu_get_device(struct riscv_iommu_device *iommu,
+ unsigned devid)
+{
+ struct rb_node *node;
+ struct riscv_iommu_endpoint *ep;
+ struct device *dev = NULL;
+
+ mutex_lock(&iommu->eps_mutex);
+
+ node = iommu->eps.rb_node;
+ while (node && !dev) {
+ ep = rb_entry(node, struct riscv_iommu_endpoint, node);
+ if (ep->devid < devid)
+ node = node->rb_right;
+ else if (ep->devid > devid)
+ node = node->rb_left;
+ else
+ dev = get_device(ep->dev);
+ }
+
+ mutex_unlock(&iommu->eps_mutex);
+
+ return dev;
+}
+
+static int riscv_iommu_ats_prgr(struct device *dev, struct iommu_page_response *msg)
+{
+ struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
+ struct riscv_iommu_command cmd;
+ u8 resp_code;
+ unsigned long payload;
+
+ switch (msg->code) {
+ case IOMMU_PAGE_RESP_SUCCESS:
+ resp_code = 0b0000;
+ break;
+ case IOMMU_PAGE_RESP_INVALID:
+ resp_code = 0b0001;
+ break;
+ case IOMMU_PAGE_RESP_FAILURE:
+ resp_code = 0b1111;
+ break;
+ }
+ payload = riscv_iommu_ats_prgr_payload(ep->devid, resp_code, msg->grpid);
+
+ /* ATS Page Request Group Response */
+ riscv_iommu_cmd_ats_prgr(&cmd);
+ riscv_iommu_cmd_ats_set_dseg(&cmd, ep->domid);
+ riscv_iommu_cmd_ats_set_rid(&cmd, ep->devid);
+ if (msg->flags & IOMMU_PAGE_RESP_PASID_VALID)
+ riscv_iommu_cmd_ats_set_pid(&cmd, msg->pasid);
+ riscv_iommu_cmd_ats_set_payload(&cmd, payload);
+ riscv_iommu_post(ep->iommu, &cmd);
+
+ return 0;
+}
+
+static void riscv_iommu_page_request(struct riscv_iommu_device *iommu,
+ struct riscv_iommu_pq_record *req)
+{
+ struct iommu_fault_event event = { 0 };
+ struct iommu_fault_page_request *prm = &event.fault.prm;
+ int ret;
+ struct device *dev;
+ unsigned devid = FIELD_GET(RISCV_IOMMU_PREQ_HDR_DID, req->hdr);
+
+ /* Ignore PGR Stop marker. */
+ if ((req->payload & RISCV_IOMMU_PREQ_PAYLOAD_M) == RISCV_IOMMU_PREQ_PAYLOAD_L)
+ return;
+
+ dev = riscv_iommu_get_device(iommu, devid);
+ if (!dev) {
+ /* TODO: Handle invalid page request */
+ return;
+ }
+
+ event.fault.type = IOMMU_FAULT_PAGE_REQ;
+
+ if (req->payload & RISCV_IOMMU_PREQ_PAYLOAD_L)
+ prm->flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+ if (req->payload & RISCV_IOMMU_PREQ_PAYLOAD_W)
+ prm->perm |= IOMMU_FAULT_PERM_WRITE;
+ if (req->payload & RISCV_IOMMU_PREQ_PAYLOAD_R)
+ prm->perm |= IOMMU_FAULT_PERM_READ;
+
+ prm->grpid = FIELD_GET(RISCV_IOMMU_PREQ_PRG_INDEX, req->payload);
+ prm->addr = FIELD_GET(RISCV_IOMMU_PREQ_UADDR, req->payload) << PAGE_SHIFT;
+
+ if (req->hdr & RISCV_IOMMU_PREQ_HDR_PV) {
+ prm->flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+ /* TODO: where to find this bit */
+ prm->flags |= IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID;
+ prm->pasid = FIELD_GET(RISCV_IOMMU_PREQ_HDR_PID, req->hdr);
+ }
+
+ ret = iommu_report_device_fault(dev, &event);
+ if (ret) {
+ struct iommu_page_response resp = {
+ .grpid = prm->grpid,
+ .code = IOMMU_PAGE_RESP_FAILURE,
+ };
+ if (prm->flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID) {
+ resp.flags |= IOMMU_PAGE_RESP_PASID_VALID;
+ resp.pasid = prm->pasid;
+ }
+ riscv_iommu_ats_prgr(dev, &resp);
+ }
+
+ put_device(dev);
+}
+
+static int riscv_iommu_page_response(struct device *dev,
+ struct iommu_fault_event *evt,
+ struct iommu_page_response *msg)
+{
+ return riscv_iommu_ats_prgr(dev, msg);
+}
+
/* Page request interface queue primary interrupt handler */
static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data)
{
@@ -626,7 +897,7 @@ static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
struct riscv_iommu_device *iommu;
struct riscv_iommu_pq_record *requests;
- unsigned cnt, idx, ctrl;
+ unsigned cnt, len, idx, ctrl;

iommu = container_of(q, struct riscv_iommu_device, priq);
requests = (struct riscv_iommu_pq_record *)q->base;
@@ -649,7 +920,8 @@ static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
cnt = riscv_iommu_queue_consume(iommu, q, &idx);
if (!cnt)
break;
- dev_warn(iommu->dev, "unexpected %u page requests\n", cnt);
+ for (len = 0; len < cnt; idx++, len++)
+ riscv_iommu_page_request(iommu, &requests[idx]);
riscv_iommu_queue_release(iommu, q, cnt);
} while (1);

@@ -660,6 +932,169 @@ static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
* Endpoint management
*/

+/* Endpoint features/capabilities */
+static void riscv_iommu_disable_ep(struct riscv_iommu_endpoint *ep)
+{
+ struct pci_dev *pdev;
+
+ if (!dev_is_pci(ep->dev))
+ return;
+
+ pdev = to_pci_dev(ep->dev);
+
+ if (ep->pasid_enabled) {
+ pci_disable_ats(pdev);
+ pci_disable_pri(pdev);
+ pci_disable_pasid(pdev);
+ ep->pasid_enabled = false;
+ }
+}
+
+static void riscv_iommu_enable_ep(struct riscv_iommu_endpoint *ep)
+{
+ int rc, feat, num;
+ struct pci_dev *pdev;
+ struct device *dev = ep->dev;
+
+ if (!dev_is_pci(dev))
+ return;
+
+ if (!ep->iommu->iommu.max_pasids)
+ return;
+
+ pdev = to_pci_dev(dev);
+
+ if (!pci_ats_supported(pdev))
+ return;
+
+ if (!pci_pri_supported(pdev))
+ return;
+
+ feat = pci_pasid_features(pdev);
+ if (feat < 0)
+ return;
+
+ num = pci_max_pasids(pdev);
+ if (!num) {
+ dev_warn(dev, "Can't enable PASID (num: %d)\n", num);
+ return;
+ }
+
+ if (num > ep->iommu->iommu.max_pasids)
+ num = ep->iommu->iommu.max_pasids;
+
+ rc = pci_enable_pasid(pdev, feat);
+ if (rc) {
+ dev_warn(dev, "Can't enable PASID (rc: %d)\n", rc);
+ return;
+ }
+
+ rc = pci_reset_pri(pdev);
+ if (rc) {
+ dev_warn(dev, "Can't reset PRI (rc: %d)\n", rc);
+ pci_disable_pasid(pdev);
+ return;
+ }
+
+ /* TODO: Get supported PRI queue length, hard-code to 32 entries */
+ rc = pci_enable_pri(pdev, 32);
+ if (rc) {
+ dev_warn(dev, "Can't enable PRI (rc: %d)\n", rc);
+ pci_disable_pasid(pdev);
+ return;
+ }
+
+ rc = pci_enable_ats(pdev, PAGE_SHIFT);
+ if (rc) {
+ dev_warn(dev, "Can't enable ATS (rc: %d)\n", rc);
+ pci_disable_pri(pdev);
+ pci_disable_pasid(pdev);
+ return;
+ }
+
+ ep->pc = (struct riscv_iommu_pc *)get_zeroed_page(GFP_KERNEL);
+ if (!ep->pc) {
+ pci_disable_ats(pdev);
+ pci_disable_pri(pdev);
+ pci_disable_pasid(pdev);
+ return;
+ }
+
+ ep->pasid_enabled = true;
+ ep->pasid_feat = feat;
+ ep->pasid_bits = ilog2(num);
+
+ dev_dbg(ep->dev, "PASID/ATS support enabled, %d bits\n", ep->pasid_bits);
+}
+
+static int riscv_iommu_enable_sva(struct device *dev)
+{
+ int ret;
+ struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
+
+ if (!ep || !ep->iommu || !ep->iommu->pq_work)
+ return -EINVAL;
+
+ if (!ep->pasid_enabled)
+ return -ENODEV;
+
+ ret = iopf_queue_add_device(ep->iommu->pq_work, dev);
+ if (ret)
+ return ret;
+
+ return iommu_register_device_fault_handler(dev, iommu_queue_iopf, dev);
+}
+
+static int riscv_iommu_disable_sva(struct device *dev)
+{
+ int ret;
+ struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
+
+ ret = iommu_unregister_device_fault_handler(dev);
+ if (!ret)
+ ret = iopf_queue_remove_device(ep->iommu->pq_work, dev);
+
+ return ret;
+}
+
+static int riscv_iommu_enable_iopf(struct device *dev)
+{
+ struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
+
+ if (ep && ep->pasid_enabled)
+ return 0;
+
+ return -EINVAL;
+}
+
+static int riscv_iommu_dev_enable_feat(struct device *dev, enum iommu_dev_features feat)
+{
+ switch (feat) {
+ case IOMMU_DEV_FEAT_IOPF:
+ return riscv_iommu_enable_iopf(dev);
+
+ case IOMMU_DEV_FEAT_SVA:
+ return riscv_iommu_enable_sva(dev);
+
+ default:
+ return -ENODEV;
+ }
+}
+
+static int riscv_iommu_dev_disable_feat(struct device *dev, enum iommu_dev_features feat)
+{
+ switch (feat) {
+ case IOMMU_DEV_FEAT_IOPF:
+ return 0;
+
+ case IOMMU_DEV_FEAT_SVA:
+ return riscv_iommu_disable_sva(dev);
+
+ default:
+ return -ENODEV;
+ }
+}
+
static int riscv_iommu_of_xlate(struct device *dev, struct of_phandle_args *args)
{
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -812,6 +1247,7 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)

dev_iommu_priv_set(dev, ep);
riscv_iommu_add_device(iommu, dev);
+ riscv_iommu_enable_ep(ep);

return &iommu->iommu;
}
@@ -843,6 +1279,8 @@ static void riscv_iommu_release_device(struct device *dev)
riscv_iommu_iodir_inv_devid(iommu, ep->devid);
}

+ riscv_iommu_disable_ep(ep);
+
/* Remove endpoint from IOMMU tracking structures */
mutex_lock(&iommu->eps_mutex);
rb_erase(&ep->node, &iommu->eps);
@@ -878,7 +1316,8 @@ static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
type != IOMMU_DOMAIN_DMA_FQ &&
type != IOMMU_DOMAIN_UNMANAGED &&
type != IOMMU_DOMAIN_IDENTITY &&
- type != IOMMU_DOMAIN_BLOCKED)
+ type != IOMMU_DOMAIN_BLOCKED &&
+ type != IOMMU_DOMAIN_SVA)
return NULL;

domain = kzalloc(sizeof(*domain), GFP_KERNEL);
@@ -906,6 +1345,9 @@ static void riscv_iommu_domain_free(struct iommu_domain *iommu_domain)
pr_warn("IOMMU domain is not empty!\n");
}

+ if (domain->mn.ops && iommu_domain->mm)
+ mmu_notifier_unregister(&domain->mn, iommu_domain->mm);
+
if (domain->pgtbl.cookie)
free_io_pgtable_ops(&domain->pgtbl.ops);

@@ -1023,14 +1465,29 @@ static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct devi
*/
val = FIELD_PREP(RISCV_IOMMU_DC_TA_PSCID, domain->pscid);

- dc->ta = cpu_to_le64(val);
- dc->fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
+ if (ep->pasid_enabled) {
+ ep->pc[0].ta = cpu_to_le64(val | RISCV_IOMMU_PC_TA_V);
+ ep->pc[0].fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
+ dc->ta = 0;
+ dc->fsc = cpu_to_le64(virt_to_pfn(ep->pc) |
+ FIELD_PREP(RISCV_IOMMU_DC_FSC_MODE, RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8));
+ } else {
+ dc->ta = cpu_to_le64(val);
+ dc->fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
+ }

wmb();

/* Mark device context as valid, synchronise device context cache. */
val = RISCV_IOMMU_DC_TC_V;

+ if (ep->pasid_enabled) {
+ val |= RISCV_IOMMU_DC_TC_EN_ATS |
+ RISCV_IOMMU_DC_TC_EN_PRI |
+ RISCV_IOMMU_DC_TC_DPE |
+ RISCV_IOMMU_DC_TC_PDTV;
+ }
+
if (ep->iommu->cap & RISCV_IOMMU_CAP_AMO) {
val |= RISCV_IOMMU_DC_TC_GADE |
RISCV_IOMMU_DC_TC_SADE;
@@ -1051,13 +1508,107 @@ static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct devi
return 0;
}

+static int riscv_iommu_set_dev_pasid(struct iommu_domain *iommu_domain,
+ struct device *dev, ioasid_t pasid)
+{
+ struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
+ struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
+ u64 ta, fsc;
+
+ if (!iommu_domain || !iommu_domain->mm)
+ return -EINVAL;
+
+ /* Driver uses TC.DPE mode, PASID #0 is incorrect. */
+ if (pasid == 0)
+ return -EINVAL;
+
+ /* Incorrect domain identifier */
+ if ((int)domain->pscid < 0)
+ return -ENOMEM;
+
+ /* Process Context table should be set for pasid enabled endpoints. */
+ if (!ep || !ep->pasid_enabled || !ep->dc || !ep->pc)
+ return -ENODEV;
+
+ domain->pasid = pasid;
+ domain->iommu = ep->iommu;
+ domain->mn.ops = &riscv_iommu_mmuops;
+
+ /* register mm notifier */
+ if (mmu_notifier_register(&domain->mn, iommu_domain->mm))
+ return -ENODEV;
+
+ /* TODO: get SXL value for the process, use 32 bit or SATP mode */
+ fsc = virt_to_pfn(iommu_domain->mm->pgd) | satp_mode;
+ ta = RISCV_IOMMU_PC_TA_V | FIELD_PREP(RISCV_IOMMU_PC_TA_PSCID, domain->pscid);
+
+ fsc = le64_to_cpu(xchg_relaxed(&(ep->pc[pasid].fsc), cpu_to_le64(fsc)));
+ ta = le64_to_cpu(xchg_relaxed(&(ep->pc[pasid].ta), cpu_to_le64(ta)));
+
+ wmb();
+
+ if (ta & RISCV_IOMMU_PC_TA_V) {
+ riscv_iommu_iodir_inv_pasid(ep->iommu, ep->devid, pasid);
+ riscv_iommu_iofence_sync(ep->iommu);
+ }
+
+ dev_info(dev, "domain type %d attached w/ PSCID %u PASID %u\n",
+ domain->domain.type, domain->pscid, domain->pasid);
+
+ return 0;
+}
+
+static void riscv_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+ struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
+ struct riscv_iommu_command cmd;
+ unsigned long payload = riscv_iommu_ats_inval_all_payload(false);
+ u64 ta;
+
+ /* invalidate TA.V */
+ ta = le64_to_cpu(xchg_relaxed(&(ep->pc[pasid].ta), 0));
+
+ wmb();
+
+ dev_info(dev, "domain removed w/ PSCID %u PASID %u\n",
+ (unsigned)FIELD_GET(RISCV_IOMMU_PC_TA_PSCID, ta), pasid);
+
+ /* 1. invalidate PDT entry */
+ riscv_iommu_iodir_inv_pasid(ep->iommu, ep->devid, pasid);
+
+ /* 2. invalidate all matching IOATC entries (if PASID was valid) */
+ if (ta & RISCV_IOMMU_PC_TA_V) {
+ riscv_iommu_cmd_inval_vma(&cmd);
+ riscv_iommu_cmd_inval_set_gscid(&cmd, 0);
+ riscv_iommu_cmd_inval_set_pscid(&cmd,
+ FIELD_GET(RISCV_IOMMU_PC_TA_PSCID, ta));
+ riscv_iommu_post(ep->iommu, &cmd);
+ }
+
+ /* 3. Wait IOATC flush to happen */
+ riscv_iommu_iofence_sync(ep->iommu);
+
+ /* 4. ATS invalidation */
+ riscv_iommu_cmd_ats_inval(&cmd);
+ riscv_iommu_cmd_ats_set_dseg(&cmd, ep->domid);
+ riscv_iommu_cmd_ats_set_rid(&cmd, ep->devid);
+ riscv_iommu_cmd_ats_set_pid(&cmd, pasid);
+ riscv_iommu_cmd_ats_set_payload(&cmd, payload);
+ riscv_iommu_post(ep->iommu, &cmd);
+
+ /* 5. Wait DevATC flush to happen */
+ riscv_iommu_iofence_sync(ep->iommu);
+}
+
static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
unsigned long *start, unsigned long *end,
size_t *pgsize)
{
struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
struct riscv_iommu_command cmd;
+ struct riscv_iommu_endpoint *endpoint;
unsigned long iova;
+ unsigned long payload;

if (domain->mode == RISCV_IOMMU_DC_FSC_MODE_BARE)
return;
@@ -1065,6 +1616,12 @@ static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
/* Domain not attached to an IOMMU! */
BUG_ON(!domain->iommu);

+ if (start && end) {
+ payload = riscv_iommu_ats_inval_payload(*start, *end, true);
+ } else {
+ payload = riscv_iommu_ats_inval_all_payload(true);
+ }
+
riscv_iommu_cmd_inval_vma(&cmd);
riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);

@@ -1078,6 +1635,20 @@ static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
riscv_iommu_post(domain->iommu, &cmd);
}
riscv_iommu_iofence_sync(domain->iommu);
+
+ /* ATS invalidation for every device and for every translation */
+ list_for_each_entry(endpoint, &domain->endpoints, domain) {
+ if (!endpoint->pasid_enabled)
+ continue;
+
+ riscv_iommu_cmd_ats_inval(&cmd);
+ riscv_iommu_cmd_ats_set_dseg(&cmd, endpoint->domid);
+ riscv_iommu_cmd_ats_set_rid(&cmd, endpoint->devid);
+ riscv_iommu_cmd_ats_set_pid(&cmd, domain->pasid);
+ riscv_iommu_cmd_ats_set_payload(&cmd, payload);
+ riscv_iommu_post(domain->iommu, &cmd);
+ }
+ riscv_iommu_iofence_sync(domain->iommu);
}

static void riscv_iommu_flush_iotlb_all(struct iommu_domain *iommu_domain)
@@ -1310,6 +1881,7 @@ static int riscv_iommu_enable(struct riscv_iommu_device *iommu, unsigned request
static const struct iommu_domain_ops riscv_iommu_domain_ops = {
.free = riscv_iommu_domain_free,
.attach_dev = riscv_iommu_attach_dev,
+ .set_dev_pasid = riscv_iommu_set_dev_pasid,
.map_pages = riscv_iommu_map_pages,
.unmap_pages = riscv_iommu_unmap_pages,
.iova_to_phys = riscv_iommu_iova_to_phys,
@@ -1326,9 +1898,13 @@ static const struct iommu_ops riscv_iommu_ops = {
.probe_device = riscv_iommu_probe_device,
.probe_finalize = riscv_iommu_probe_finalize,
.release_device = riscv_iommu_release_device,
+ .remove_dev_pasid = riscv_iommu_remove_dev_pasid,
.device_group = riscv_iommu_device_group,
.get_resv_regions = riscv_iommu_get_resv_regions,
.of_xlate = riscv_iommu_of_xlate,
+ .dev_enable_feat = riscv_iommu_dev_enable_feat,
+ .dev_disable_feat = riscv_iommu_dev_disable_feat,
+ .page_response = riscv_iommu_page_response,
.default_domain_ops = &riscv_iommu_domain_ops,
};

@@ -1340,6 +1916,7 @@ void riscv_iommu_remove(struct riscv_iommu_device *iommu)
riscv_iommu_queue_free(iommu, &iommu->cmdq);
riscv_iommu_queue_free(iommu, &iommu->fltq);
riscv_iommu_queue_free(iommu, &iommu->priq);
+ iopf_queue_free(iommu->pq_work);
}

int riscv_iommu_init(struct riscv_iommu_device *iommu)
@@ -1362,6 +1939,12 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
}
#endif

+ if (iommu->cap & RISCV_IOMMU_CAP_PD20)
+ iommu->iommu.max_pasids = 1u << 20;
+ else if (iommu->cap & RISCV_IOMMU_CAP_PD17)
+ iommu->iommu.max_pasids = 1u << 17;
+ else if (iommu->cap & RISCV_IOMMU_CAP_PD8)
+ iommu->iommu.max_pasids = 1u << 8;
/*
* Assign queue lengths from module parameters if not already
* set on the device tree.
@@ -1387,6 +1970,13 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
goto fail;
if (!(iommu->cap & RISCV_IOMMU_CAP_ATS))
goto no_ats;
+ /* PRI functionally depends on ATS’s capabilities. */
+ iommu->pq_work = iopf_queue_alloc(dev_name(dev));
+ if (!iommu->pq_work) {
+ dev_err(dev, "failed to allocate iopf queue\n");
+ ret = -ENOMEM;
+ goto fail;
+ }

ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_PAGE_REQUEST_QUEUE);
if (ret)
@@ -1424,5 +2014,6 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
riscv_iommu_queue_free(iommu, &iommu->priq);
riscv_iommu_queue_free(iommu, &iommu->fltq);
riscv_iommu_queue_free(iommu, &iommu->cmdq);
+ iopf_queue_free(iommu->pq_work);
return ret;
}
diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
index fe32a4eff14e..83e8d00fd0f8 100644
--- a/drivers/iommu/riscv/iommu.h
+++ b/drivers/iommu/riscv/iommu.h
@@ -17,9 +17,11 @@
#include <linux/iova.h>
#include <linux/io.h>
#include <linux/idr.h>
+#include <linux/mmu_notifier.h>
#include <linux/list.h>
#include <linux/iommu.h>
#include <linux/io-pgtable.h>
+#include <linux/mmu_notifier.h>

#include "iommu-bits.h"

@@ -76,6 +78,9 @@ struct riscv_iommu_device {
unsigned ddt_mode;
bool ddtp_in_iomem;

+ /* I/O page fault queue */
+ struct iopf_queue *pq_work;
+
/* hardware queues */
struct riscv_iommu_queue cmdq;
struct riscv_iommu_queue fltq;
@@ -91,11 +96,14 @@ struct riscv_iommu_domain {
struct io_pgtable pgtbl;

struct list_head endpoints;
+ struct list_head notifiers;
struct mutex lock;
+ struct mmu_notifier mn;
struct riscv_iommu_device *iommu;

unsigned mode; /* RIO_ATP_MODE_* enum */
unsigned pscid; /* RISC-V IOMMU PSCID */
+ ioasid_t pasid; /* IOMMU_DOMAIN_SVA: Cached PASID */

pgd_t *pgd_root; /* page table root pointer */
};
@@ -107,10 +115,16 @@ struct riscv_iommu_endpoint {
unsigned domid; /* PCI domain number, segment */
struct rb_node node; /* device tracking node (lookup by devid) */
struct riscv_iommu_dc *dc; /* device context pointer */
+ struct riscv_iommu_pc *pc; /* process context root, valid if pasid_enabled is true */
struct riscv_iommu_device *iommu; /* parent iommu device */

struct mutex lock;
struct list_head domain; /* endpoint attached managed domain */
+
+ /* end point info bits */
+ unsigned pasid_bits;
+ unsigned pasid_feat;
+ bool pasid_enabled;
};

/* Helper functions and macros */
--
2.34.1


2023-07-19 20:23:44

by Tomasz Jeznach

[permalink] [raw]
Subject: [PATCH 07/11] RISC-V: drivers/iommu/riscv: Add device context support

Introduces per device translation context, with 1,2 or 3 tree level
device tree structures.

Signed-off-by: Tomasz Jeznach <[email protected]>
---
drivers/iommu/riscv/iommu.c | 163 ++++++++++++++++++++++++++++++++++--
drivers/iommu/riscv/iommu.h | 1 +
2 files changed, 158 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 5c4cf9875302..9ee7d2b222b5 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -41,7 +41,7 @@ MODULE_ALIAS("riscv-iommu");
MODULE_LICENSE("GPL v2");

/* Global IOMMU params. */
-static int ddt_mode = RISCV_IOMMU_DDTP_MODE_BARE;
+static int ddt_mode = RISCV_IOMMU_DDTP_MODE_3LVL;
module_param(ddt_mode, int, 0644);
MODULE_PARM_DESC(ddt_mode, "Device Directory Table mode.");

@@ -452,6 +452,14 @@ static bool riscv_iommu_post(struct riscv_iommu_device *iommu,
return riscv_iommu_post_sync(iommu, cmd, false);
}

+static bool riscv_iommu_iodir_inv_devid(struct riscv_iommu_device *iommu, unsigned devid)
+{
+ struct riscv_iommu_command cmd;
+ riscv_iommu_cmd_iodir_inval_ddt(&cmd);
+ riscv_iommu_cmd_iodir_set_did(&cmd, devid);
+ return riscv_iommu_post(iommu, &cmd);
+}
+
static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
{
struct riscv_iommu_command cmd;
@@ -671,6 +679,94 @@ static bool riscv_iommu_capable(struct device *dev, enum iommu_cap cap)
return false;
}

+/* TODO: implement proper device context management, e.g. teardown flow */
+
+/* Lookup or initialize device directory info structure. */
+static struct riscv_iommu_dc *riscv_iommu_get_dc(struct riscv_iommu_device *iommu,
+ unsigned devid)
+{
+ const bool base_format = !(iommu->cap & RISCV_IOMMU_CAP_MSI_FLAT);
+ unsigned depth = iommu->ddt_mode - RISCV_IOMMU_DDTP_MODE_1LVL;
+ u8 ddi_bits[3] = { 0 };
+ u64 *ddtp = NULL, ddt;
+
+ if (iommu->ddt_mode == RISCV_IOMMU_DDTP_MODE_OFF ||
+ iommu->ddt_mode == RISCV_IOMMU_DDTP_MODE_BARE)
+ return NULL;
+
+ /* Make sure the mode is valid */
+ if (iommu->ddt_mode > RISCV_IOMMU_DDTP_MODE_MAX)
+ return NULL;
+
+ /*
+ * Device id partitioning for base format:
+ * DDI[0]: bits 0 - 6 (1st level) (7 bits)
+ * DDI[1]: bits 7 - 15 (2nd level) (9 bits)
+ * DDI[2]: bits 16 - 23 (3rd level) (8 bits)
+ *
+ * For extended format:
+ * DDI[0]: bits 0 - 5 (1st level) (6 bits)
+ * DDI[1]: bits 6 - 14 (2nd level) (9 bits)
+ * DDI[2]: bits 15 - 23 (3rd level) (9 bits)
+ */
+ if (base_format) {
+ ddi_bits[0] = 7;
+ ddi_bits[1] = 7 + 9;
+ ddi_bits[2] = 7 + 9 + 8;
+ } else {
+ ddi_bits[0] = 6;
+ ddi_bits[1] = 6 + 9;
+ ddi_bits[2] = 6 + 9 + 9;
+ }
+
+ /* Make sure device id is within range */
+ if (devid >= (1 << ddi_bits[depth]))
+ return NULL;
+
+ /* Get to the level of the non-leaf node that holds the device context */
+ for (ddtp = (u64 *) iommu->ddtp; depth-- > 0;) {
+ const int split = ddi_bits[depth];
+ /*
+ * Each non-leaf node is 64bits wide and on each level
+ * nodes are indexed by DDI[depth].
+ */
+ ddtp += (devid >> split) & 0x1FF;
+
+ retry:
+ /*
+ * Check if this node has been populated and if not
+ * allocate a new level and populate it.
+ */
+ ddt = READ_ONCE(*ddtp);
+ if (ddt & RISCV_IOMMU_DDTE_VALID) {
+ ddtp = __va(ppn_to_phys(ddt));
+ } else {
+ u64 old, new = get_zeroed_page(GFP_KERNEL);
+ if (!new)
+ return NULL;
+
+ old = cmpxchg64_relaxed(ddtp, ddt,
+ phys_to_ppn(__pa(new)) |
+ RISCV_IOMMU_DDTE_VALID);
+
+ if (old != ddt) {
+ free_page(new);
+ goto retry;
+ }
+
+ ddtp = (u64 *) new;
+ }
+ }
+
+ /*
+ * Grab the node that matches DDI[depth], note that when using base
+ * format the device context is 4 * 64bits, and the extended format
+ * is 8 * 64bits, hence the (3 - base_format) below.
+ */
+ ddtp += (devid & ((64 << base_format) - 1)) << (3 - base_format);
+ return (struct riscv_iommu_dc *)ddtp;
+}
+
static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
{
struct riscv_iommu_device *iommu;
@@ -708,6 +804,9 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
ep->iommu = iommu;
ep->dev = dev;

+ /* Initial DC pointer can be NULL if IOMMU is configured in OFF or BARE mode */
+ ep->dc = riscv_iommu_get_dc(iommu, ep->devid);
+
dev_info(iommu->dev, "adding device to iommu with devid %i in domain %i\n",
ep->devid, ep->domid);

@@ -734,6 +833,16 @@ static void riscv_iommu_release_device(struct device *dev)
list_del(&ep->domain);
mutex_unlock(&ep->lock);

+ if (ep->dc) {
+ // this should be already done by domain detach.
+ ep->dc->tc = 0ULL;
+ wmb();
+ ep->dc->fsc = 0ULL;
+ ep->dc->iohgatp = 0ULL;
+ wmb();
+ riscv_iommu_iodir_inv_devid(iommu, ep->devid);
+ }
+
/* Remove endpoint from IOMMU tracking structures */
mutex_lock(&iommu->eps_mutex);
rb_erase(&ep->node, &iommu->eps);
@@ -853,11 +962,21 @@ static int riscv_iommu_domain_finalize(struct riscv_iommu_domain *domain,
return 0;
}

+static u64 riscv_iommu_domain_atp(struct riscv_iommu_domain *domain)
+{
+ u64 atp = FIELD_PREP(RISCV_IOMMU_DC_FSC_MODE, domain->mode);
+ if (domain->mode != RISCV_IOMMU_DC_FSC_MODE_BARE)
+ atp |= FIELD_PREP(RISCV_IOMMU_DC_FSC_PPN, virt_to_pfn(domain->pgd_root));
+ return atp;
+}
+
static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct device *dev)
{
struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
+ struct riscv_iommu_dc *dc = ep->dc;
int ret;
+ u64 val;

/* PSCID not valid */
if ((int)domain->pscid < 0)
@@ -880,17 +999,44 @@ static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct devi
return ret;
}

- if (ep->iommu->ddt_mode != RISCV_IOMMU_DDTP_MODE_BARE ||
- domain->domain.type != IOMMU_DOMAIN_IDENTITY) {
- dev_warn(dev, "domain type %d not supported\n",
- domain->domain.type);
+ if (ep->iommu->ddt_mode == RISCV_IOMMU_DDTP_MODE_BARE &&
+ domain->domain.type == IOMMU_DOMAIN_IDENTITY) {
+ dev_info(dev, "domain type %d attached w/ PSCID %u\n",
+ domain->domain.type, domain->pscid);
+ return 0;
+ }
+
+ if (!dc) {
return -ENODEV;
}

+ /*
+ * S-Stage translation table. G-Stage remains unmodified (BARE).
+ */
+ val = FIELD_PREP(RISCV_IOMMU_DC_TA_PSCID, domain->pscid);
+
+ dc->ta = cpu_to_le64(val);
+ dc->fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
+
+ wmb();
+
+ /* Mark device context as valid, synchronise device context cache. */
+ val = RISCV_IOMMU_DC_TC_V;
+
+ if (ep->iommu->cap & RISCV_IOMMU_CAP_AMO) {
+ val |= RISCV_IOMMU_DC_TC_GADE |
+ RISCV_IOMMU_DC_TC_SADE;
+ }
+
+ dc->tc = cpu_to_le64(val);
+ wmb();
+
list_add_tail(&ep->domain, &domain->endpoints);
mutex_unlock(&ep->lock);
mutex_unlock(&domain->lock);

+ riscv_iommu_iodir_inv_devid(ep->iommu, ep->devid);
+
dev_info(dev, "domain type %d attached w/ PSCID %u\n",
domain->domain.type, domain->pscid);

@@ -1239,7 +1385,12 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
goto fail;

no_ats:
- ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);
+ if (iommu_default_passthrough()) {
+ dev_info(dev, "iommu set to passthrough mode\n");
+ ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);
+ } else {
+ ret = riscv_iommu_enable(iommu, ddt_mode);
+ }

if (ret) {
dev_err(dev, "cannot enable iommu device (%d)\n", ret);
diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
index 04148a2a8ffd..9140df71e17b 100644
--- a/drivers/iommu/riscv/iommu.h
+++ b/drivers/iommu/riscv/iommu.h
@@ -105,6 +105,7 @@ struct riscv_iommu_endpoint {
unsigned devid; /* PCI bus:device:function number */
unsigned domid; /* PCI domain number, segment */
struct rb_node node; /* device tracking node (lookup by devid) */
+ struct riscv_iommu_dc *dc; /* device context pointer */
struct riscv_iommu_device *iommu; /* parent iommu device */

struct mutex lock;
--
2.34.1


2023-07-19 20:51:08

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings

Hey Tomasz,

On Wed, Jul 19, 2023 at 12:33:47PM -0700, Tomasz Jeznach wrote:
> From: Anup Patel <[email protected]>
>
> We add DT bindings document for RISC-V IOMMU platform and PCI devices
> defined by the RISC-V IOMMU specification.
>
> Signed-off-by: Anup Patel <[email protected]>

Your signoff is missing from here.

Secondly, as get_maintainer.pl would have told you, dt-bindings patches
need to be sent to the dt-binding maintainers and list.
+CC maintainers & list.

Thirdly, dt-binding patches should come before their users.

> ---
> .../bindings/iommu/riscv,iommu.yaml | 146 ++++++++++++++++++
> 1 file changed, 146 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
>
> diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> new file mode 100644
> index 000000000000..8a9aedb61768
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> @@ -0,0 +1,146 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: RISC-V IOMMU Implementation
> +
> +maintainers:
> + - Tomasz Jeznach <[email protected]>

What about Anup, who seems to have written this?
Or your co-authors of the drivers?

> +
> +description:
> + The RISC-V IOMMU specificaiton defines an IOMMU for RISC-V platforms
> + which can be a regular platform device or a PCI device connected to
> + the host root port.
> +
> + The RISC-V IOMMU provides two stage translation, device directory table,
> + command queue and fault reporting as wired interrupt or MSIx event for
> + both PCI and platform devices.
> +
> + Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
> +
> +properties:
> + compatible:
> + oneOf:
> + - description: RISC-V IOMMU as a platform device
> + items:
> + - enum:
> + - vendor,chip-iommu

These dummy compatibles are not valid, as was pointed out to Anup on
the AIA series. Please go look at what was done there instead:
https://lore.kernel.org/all/[email protected]/

> + - const: riscv,iommu
> +
> + - description: RISC-V IOMMU as a PCI device connected to root port
> + items:
> + - enum:
> + - vendor,chip-pci-iommu
> + - const: riscv,pci-iommu

I'm not really au fait with the arm smmu stuff, but do any of its
versions support being connected to a root port?

> + reg:
> + maxItems: 1
> + description:
> + For RISC-V IOMMU as a platform device, this represents the MMIO base
> + address of registers.
> +
> + For RISC-V IOMMU as a PCI device, this represents the PCI-PCI bridge
> + details as described in Documentation/devicetree/bindings/pci/pci.txt
> +
> + '#iommu-cells':
> + const: 2
> + description: |

|s are only needed where formatting needs to be preserved.

> + Each IOMMU specifier represents the base device ID and number of
> + device IDs.
> +
> + interrupts:
> + minItems: 1
> + maxItems: 16

What are any of these interrupts?

> + description:
> + The presence of this property implies that given RISC-V IOMMU uses
> + wired interrupts to notify the RISC-V HARTS (or CPUs).
> +
> + msi-parent:
> + description:
> + The presence of this property implies that given RISC-V IOMMU uses
> + MSIx to notify the RISC-V HARTs (or CPUs). This property should be
> + considered only when the interrupts property is absent.
> +
> + dma-coherent:

RISC-V is dma-coherent by default, should this not be dma-noncoherent
instead?

> + description:
> + Present if page table walks and DMA accessed made by the RISC-V IOMMU
> + are cache coherent with the CPU.
> +
> + power-domains:
> + maxItems: 1
> +
> +required:
> + - compatible
> + - reg
> + - '#iommu-cells'
> +
> +additionalProperties: false
> +
> +examples:
> + - |
> + /* Example 1 (IOMMU platform device with wired interrupts) */
> + immu1: iommu@1bccd000 {

Why is this "immu"? typo or intentional?

> + compatible = "vendor,chip-iommu", "riscv,iommu";
> + reg = <0x1bccd000 0x1000>;
> + interrupt-parent = <&aplic_smode>;
> + interrupts = <32 4>, <33 4>, <34 4>, <35 4>;
> + #iommu-cells = <2>;
> + };
> +
> + /* Device with two IOMMU device IDs, 0 and 7 */
> + master1 {
> + iommus = <&immu1 0 1>, <&immu1 7 1>;
> + };
> +
> + - |
> + /* Example 2 (IOMMU platform device with MSIs) */
> + immu2: iommu@1bcdd000 {
> + compatible = "vendor,chip-iommu", "riscv,iommu";
> + reg = <0x1bccd000 0x1000>;
> + msi-parent = <&imsics_smode>;
> + #iommu-cells = <2>;
> + };
> +
> + bus {
> + #address-cells = <2>;
> + #size-cells = <2>;
> +
> + /* Device with IOMMU device IDs ranging from 32 to 64 */
> + master1 {
> + iommus = <&immu2 32 32>;
> + };
> +
> + pcie@40000000 {
> + compatible = "pci-host-cam-generic";
> + device_type = "pci";
> + #address-cells = <3>;
> + #size-cells = <2>;
> + bus-range = <0x0 0x1>;
> +
> + /* CPU_PHYSICAL(2) SIZE(2) */

These sort of comments seem to just repeat what address-cells &
size-cells has already said, no?

Thanks,
Conor.


Attachments:
(No filename) (5.58 kB)
signature.asc (235.00 B)
Download all attachments

2023-07-19 21:20:46

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings

On Wed, Jul 19, 2023 at 01:52:28PM -0700, Tomasz Jeznach wrote:
> On Wed, Jul 19, 2023 at 1:19 PM Conor Dooley <[email protected]> wrote:
>
> > Hey Tomasz,
> >
> > On Wed, Jul 19, 2023 at 12:33:47PM -0700, Tomasz Jeznach wrote:
> > > From: Anup Patel <[email protected]>
> > >
> > > We add DT bindings document for RISC-V IOMMU platform and PCI devices
> > > defined by the RISC-V IOMMU specification.
> > >
> > > Signed-off-by: Anup Patel <[email protected]>
> >
> > Your signoff is missing from here.
> >
> > Secondly, as get_maintainer.pl would have told you, dt-bindings patches
> > need to be sent to the dt-binding maintainers and list.
> > +CC maintainers & list.
> >
> > Thirdly, dt-binding patches should come before their users.
> >
>
>
> Thank you for pointing out and adding DT maintainers.
> The signoff is definitely missing, and I'll will amend with other fixes /
> reordering.

Yeah, please wait until you get actual feedback on the drivers etc
though before you do that.

Also, don't send html emails to the mailing lists. They will be rejected
and those outside of direct-cc will not see the emails.

> > > ---
> > > .../bindings/iommu/riscv,iommu.yaml | 146 ++++++++++++++++++
> > > 1 file changed, 146 insertions(+)
> > > create mode 100644
> > Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > >
> > > diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > new file mode 100644
> > > index 000000000000..8a9aedb61768
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > @@ -0,0 +1,146 @@
> > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > +%YAML 1.2
> > > +---
> > > +$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
> > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > +
> > > +title: RISC-V IOMMU Implementation
> > > +
> > > +maintainers:
> > > + - Tomasz Jeznach <[email protected]>
> >
> > What about Anup, who seems to have written this?
> > Or your co-authors of the drivers?
> >
> >
> Anup provided only device tree riscv,iommu bindings proposal, but handed
> over its maintenance.
>
> > +
> > > +description:
> > > + The RISC-V IOMMU specificaiton defines an IOMMU for RISC-V platforms
> > > + which can be a regular platform device or a PCI device connected to
> > > + the host root port.
> > > +
> > > + The RISC-V IOMMU provides two stage translation, device directory
> > table,
> > > + command queue and fault reporting as wired interrupt or MSIx event for
> > > + both PCI and platform devices.
> > > +
> > > + Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
> > > +
> > > +properties:
> > > + compatible:
> > > + oneOf:
> > > + - description: RISC-V IOMMU as a platform device
> > > + items:
> > > + - enum:
> > > + - vendor,chip-iommu
> >
> > These dummy compatibles are not valid, as was pointed out to Anup on
> > the AIA series. Please go look at what was done there instead:
> >
> > https://lore.kernel.org/all/[email protected]/
> >
> >
> Thank you, good pointer, seams like the same comments apply here. Will go
> through the discussion and update.
>
>
> > > + - const: riscv,iommu
> > > +
> > > + - description: RISC-V IOMMU as a PCI device connected to root port
> > > + items:
> > > + - enum:
> > > + - vendor,chip-pci-iommu
> > > + - const: riscv,pci-iommu
> >
> > I'm not really au fait with the arm smmu stuff, but do any of its
> > versions support being connected to a root port?
> >
> >
> RISC-V IOMMU allows them to be connected to the root port, or presented as
> a platform device.

That is not quite what I asked... What I want to know is why we are
doing something different to Arm's SMMU stuff & whether it is because
RISC-V has extra capabilities, or the binding itself is flawed.

(There's no more comments from me below, just making sure the mail's
contents reaches lore)

Cheers,
Conor.

> > > + reg:
> > > + maxItems: 1
> > > + description:
> > > + For RISC-V IOMMU as a platform device, this represents the MMIO
> > base
> > > + address of registers.
> > > +
> > > + For RISC-V IOMMU as a PCI device, this represents the PCI-PCI
> > bridge
> > > + details as described in
> > Documentation/devicetree/bindings/pci/pci.txt
> > > +
> > > + '#iommu-cells':
> > > + const: 2
> > > + description: |
> >
> > |s are only needed where formatting needs to be preserved.
> >
> > > + Each IOMMU specifier represents the base device ID and number of
> > > + device IDs.
> > > +
> > > + interrupts:
> > > + minItems: 1
> > > + maxItems: 16
> >
> > What are any of these interrupts?
> >
> >
> I'll add a description to the file. In short queue interfaces signalling to
> the driver.
>
>
> > + description:
> > > + The presence of this property implies that given RISC-V IOMMU uses
> > > + wired interrupts to notify the RISC-V HARTS (or CPUs).
> > > +
> > > + msi-parent:
> > > + description:
> > > + The presence of this property implies that given RISC-V IOMMU uses
> > > + MSIx to notify the RISC-V HARTs (or CPUs). This property should be
> > > + considered only when the interrupts property is absent.
> > > +
> > > + dma-coherent:
> >
> > RISC-V is dma-coherent by default, should this not be dma-noncoherent
> > instead?
> >
> >
> Very valid comment. I'm ok to reverse the flag unless anyone objects.
>
>
> > > + description:
> > > + Present if page table walks and DMA accessed made by the RISC-V
> > IOMMU
> > > + are cache coherent with the CPU.
> > > +
> > > + power-domains:
> > > + maxItems: 1
> > > +
> > > +required:
> > > + - compatible
> > > + - reg
> > > + - '#iommu-cells'
> > > +
> > > +additionalProperties: false
> > > +
> > > +examples:
> > > + - |
> > > + /* Example 1 (IOMMU platform device with wired interrupts) */
> > > + immu1: iommu@1bccd000 {
> >
> > Why is this "immu"? typo or intentional?
> >
>
> I guess there was no particular naming schema here, but I might defer this
> question to the author.
>
>
> >
> > > + compatible = "vendor,chip-iommu", "riscv,iommu";
> > > + reg = <0x1bccd000 0x1000>;
> > > + interrupt-parent = <&aplic_smode>;
> > > + interrupts = <32 4>, <33 4>, <34 4>, <35 4>;
> > > + #iommu-cells = <2>;
> > > + };
> > > +
> > > + /* Device with two IOMMU device IDs, 0 and 7 */
> > > + master1 {
> > > + iommus = <&immu1 0 1>, <&immu1 7 1>;
> > > + };
> > > +
> > > + - |
> > > + /* Example 2 (IOMMU platform device with MSIs) */
> > > + immu2: iommu@1bcdd000 {
> > > + compatible = "vendor,chip-iommu", "riscv,iommu";
> > > + reg = <0x1bccd000 0x1000>;
> > > + msi-parent = <&imsics_smode>;
> > > + #iommu-cells = <2>;
> > > + };
> > > +
> > > + bus {
> > > + #address-cells = <2>;
> > > + #size-cells = <2>;
> > > +
> > > + /* Device with IOMMU device IDs ranging from 32 to 64 */
> > > + master1 {
> > > + iommus = <&immu2 32 32>;
> > > + };
> > > +
> > > + pcie@40000000 {
> > > + compatible = "pci-host-cam-generic";
> > > + device_type = "pci";
> > > + #address-cells = <3>;
> > > + #size-cells = <2>;
> > > + bus-range = <0x0 0x1>;
> > > +
> > > + /* CPU_PHYSICAL(2) SIZE(2) */
> >
> > These sort of comments seem to just repeat what address-cells &
> > size-cells has already said, no?
> >
> >
> Correct.
>
>
>
> > Thanks,
> > Conor.
> >
>
>
> Thank you Conor for prompt response and comments.
> I'll address them in the next version.
>
> - Tomasz


Attachments:
(No filename) (7.94 kB)
signature.asc (235.00 B)
Download all attachments

2023-07-19 22:26:06

by Rob Herring

[permalink] [raw]
Subject: Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings

On Wed, Jul 19, 2023 at 2:19 PM Conor Dooley <[email protected]> wrote:
>
> Hey Tomasz,
>
> On Wed, Jul 19, 2023 at 12:33:47PM -0700, Tomasz Jeznach wrote:
> > From: Anup Patel <[email protected]>
> >
> > We add DT bindings document for RISC-V IOMMU platform and PCI devices
> > defined by the RISC-V IOMMU specification.
> >
> > Signed-off-by: Anup Patel <[email protected]>
>
> Your signoff is missing from here.
>
> Secondly, as get_maintainer.pl would have told you, dt-bindings patches
> need to be sent to the dt-binding maintainers and list.
> +CC maintainers & list.
>
> Thirdly, dt-binding patches should come before their users.
>
> > ---
> > .../bindings/iommu/riscv,iommu.yaml | 146 ++++++++++++++++++
> > 1 file changed, 146 insertions(+)
> > create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> >
> > diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > new file mode 100644
> > index 000000000000..8a9aedb61768
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > @@ -0,0 +1,146 @@
> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: RISC-V IOMMU Implementation
> > +
> > +maintainers:
> > + - Tomasz Jeznach <[email protected]>
>
> What about Anup, who seems to have written this?
> Or your co-authors of the drivers?
>
> > +
> > +description:
> > + The RISC-V IOMMU specificaiton defines an IOMMU for RISC-V platforms

typo

> > + which can be a regular platform device or a PCI device connected to
> > + the host root port.
> > +
> > + The RISC-V IOMMU provides two stage translation, device directory table,
> > + command queue and fault reporting as wired interrupt or MSIx event for
> > + both PCI and platform devices.

TBC, you want a PCI device that's an IOMMU and the IOMMU serves
(provides translation for) PCI devices?

> > +
> > + Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
> > +
> > +properties:
> > + compatible:
> > + oneOf:
> > + - description: RISC-V IOMMU as a platform device

"platform device" is a Linux term. Don't use Linux terms in bindings.

> > + items:
> > + - enum:
> > + - vendor,chip-iommu
>
> These dummy compatibles are not valid, as was pointed out to Anup on
> the AIA series. Please go look at what was done there instead:
> https://lore.kernel.org/all/[email protected]/
>
> > + - const: riscv,iommu
> > +
> > + - description: RISC-V IOMMU as a PCI device connected to root port
> > + items:
> > + - enum:
> > + - vendor,chip-pci-iommu
> > + - const: riscv,pci-iommu
>
> I'm not really au fait with the arm smmu stuff, but do any of its
> versions support being connected to a root port?

PCI devices have a defined format for the compatible string based on
VID/PID. For PCI, also usually don't need to be described in DT
because they are discoverable. The exception is when there's parts
which aren't. Which parts aren't?

> > + reg:
> > + maxItems: 1
> > + description:
> > + For RISC-V IOMMU as a platform device, this represents the MMIO base
> > + address of registers.
> > +
> > + For RISC-V IOMMU as a PCI device, this represents the PCI-PCI bridge

Your IOMMU is also a PCI-PCI bridge? Is that a normal PCI thing?


> > + details as described in Documentation/devicetree/bindings/pci/pci.txt

Don't refer to pci.txt. It is going to be removed.

> > +
> > + '#iommu-cells':
> > + const: 2
> > + description: |
>
> |s are only needed where formatting needs to be preserved.
>
> > + Each IOMMU specifier represents the base device ID and number of
> > + device IDs.

Doesn't that assume device IDs are contiguous? Generally not a safe assumption.

> > +
> > + interrupts:
> > + minItems: 1
> > + maxItems: 16
>
> What are any of these interrupts?
>
> > + description:
> > + The presence of this property implies that given RISC-V IOMMU uses
> > + wired interrupts to notify the RISC-V HARTS (or CPUs).
> > +
> > + msi-parent:
> > + description:
> > + The presence of this property implies that given RISC-V IOMMU uses
> > + MSIx to notify the RISC-V HARTs (or CPUs). This property should be
> > + considered only when the interrupts property is absent.

This doesn't make sense for a PCI device. PCI defines its own way to
describe MSI support.

> > +
> > + dma-coherent:
>
> RISC-V is dma-coherent by default, should this not be dma-noncoherent
> instead?
>
> > + description:
> > + Present if page table walks and DMA accessed made by the RISC-V IOMMU
> > + are cache coherent with the CPU.
> > +
> > + power-domains:
> > + maxItems: 1
> > +
> > +required:
> > + - compatible
> > + - reg
> > + - '#iommu-cells'
> > +
> > +additionalProperties: false
> > +
> > +examples:
> > + - |
> > + /* Example 1 (IOMMU platform device with wired interrupts) */
> > + immu1: iommu@1bccd000 {
>
> Why is this "immu"? typo or intentional?
>
> > + compatible = "vendor,chip-iommu", "riscv,iommu";
> > + reg = <0x1bccd000 0x1000>;
> > + interrupt-parent = <&aplic_smode>;
> > + interrupts = <32 4>, <33 4>, <34 4>, <35 4>;
> > + #iommu-cells = <2>;
> > + };
> > +
> > + /* Device with two IOMMU device IDs, 0 and 7 */
> > + master1 {
> > + iommus = <&immu1 0 1>, <&immu1 7 1>;
> > + };
> > +
> > + - |
> > + /* Example 2 (IOMMU platform device with MSIs) */
> > + immu2: iommu@1bcdd000 {
> > + compatible = "vendor,chip-iommu", "riscv,iommu";
> > + reg = <0x1bccd000 0x1000>;
> > + msi-parent = <&imsics_smode>;
> > + #iommu-cells = <2>;
> > + };
> > +
> > + bus {
> > + #address-cells = <2>;
> > + #size-cells = <2>;
> > +
> > + /* Device with IOMMU device IDs ranging from 32 to 64 */
> > + master1 {
> > + iommus = <&immu2 32 32>;
> > + };
> > +
> > + pcie@40000000 {
> > + compatible = "pci-host-cam-generic";
> > + device_type = "pci";
> > + #address-cells = <3>;
> > + #size-cells = <2>;
> > + bus-range = <0x0 0x1>;
> > +
> > + /* CPU_PHYSICAL(2) SIZE(2) */

I'm guessing there was more after this, but I don't have it...

Guessing, immu2 is a PCI device, but it translates for master1 which
is not a PCI device? Weird. Why would anyone build such a thing?


Rob

2023-07-19 23:50:43

by Tomasz Jeznach

[permalink] [raw]
Subject: Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings

On Wed, Jul 19, 2023 at 2:37 PM Rob Herring <[email protected]> wrote:
>
> On Wed, Jul 19, 2023 at 2:19 PM Conor Dooley <[email protected]> wrote:
> >
> > Hey Tomasz,
> >
> > On Wed, Jul 19, 2023 at 12:33:47PM -0700, Tomasz Jeznach wrote:
> > > From: Anup Patel <[email protected]>
> > >
> > > We add DT bindings document for RISC-V IOMMU platform and PCI devices
> > > defined by the RISC-V IOMMU specification.
> > >
> > > Signed-off-by: Anup Patel <[email protected]>
> >
> > Your signoff is missing from here.
> >
> > Secondly, as get_maintainer.pl would have told you, dt-bindings patches
> > need to be sent to the dt-binding maintainers and list.
> > +CC maintainers & list.
> >
> > Thirdly, dt-binding patches should come before their users.
> >
> > > ---
> > > .../bindings/iommu/riscv,iommu.yaml | 146 ++++++++++++++++++
> > > 1 file changed, 146 insertions(+)
> > > create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > >
> > > diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > new file mode 100644
> > > index 000000000000..8a9aedb61768
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > @@ -0,0 +1,146 @@
> > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > +%YAML 1.2
> > > +---
> > > +$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
> > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > +
> > > +title: RISC-V IOMMU Implementation
> > > +
> > > +maintainers:
> > > + - Tomasz Jeznach <[email protected]>
> >
> > What about Anup, who seems to have written this?
> > Or your co-authors of the drivers?
> >
> > > +
> > > +description:
> > > + The RISC-V IOMMU specificaiton defines an IOMMU for RISC-V platforms
>
> typo
>

ack

> > > + which can be a regular platform device or a PCI device connected to
> > > + the host root port.
> > > +
> > > + The RISC-V IOMMU provides two stage translation, device directory table,
> > > + command queue and fault reporting as wired interrupt or MSIx event for
> > > + both PCI and platform devices.
>
> TBC, you want a PCI device that's an IOMMU and the IOMMU serves
> (provides translation for) PCI devices?
>

Yes, IOMMU as a PCIe device providing address translation services for
connect PCIe root complex.

> > > +
> > > + Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
> > > +
> > > +properties:
> > > + compatible:
> > > + oneOf:
> > > + - description: RISC-V IOMMU as a platform device
>
> "platform device" is a Linux term. Don't use Linux terms in bindings.
>

ack.


> > > + items:
> > > + - enum:
> > > + - vendor,chip-iommu
> >
> > These dummy compatibles are not valid, as was pointed out to Anup on
> > the AIA series. Please go look at what was done there instead:
> > https://lore.kernel.org/all/[email protected]/
> >
> > > + - const: riscv,iommu
> > > +
> > > + - description: RISC-V IOMMU as a PCI device connected to root port
> > > + items:
> > > + - enum:
> > > + - vendor,chip-pci-iommu
> > > + - const: riscv,pci-iommu
> >
> > I'm not really au fait with the arm smmu stuff, but do any of its
> > versions support being connected to a root port?
>
> PCI devices have a defined format for the compatible string based on
> VID/PID. For PCI, also usually don't need to be described in DT
> because they are discoverable. The exception is when there's parts
> which aren't. Which parts aren't?
>

We've put 'riscv,pci-iommu' node here to describe relationship between PCIe
devices and IOMMU(s), needed for the pcie root complex description (iommu-map).
If there is a better way to reference PCI-IOMMU without adding
pci-iommu definition
that would solve the problem. Every other property of pci-iommu should
be discoverable.

> > > + reg:
> > > + maxItems: 1
> > > + description:
> > > + For RISC-V IOMMU as a platform device, this represents the MMIO base
> > > + address of registers.
> > > +
> > > + For RISC-V IOMMU as a PCI device, this represents the PCI-PCI bridge
>
> Your IOMMU is also a PCI-PCI bridge? Is that a normal PCI thing?
>

It's allowed to be integrated with root complex / IO bridge, but it is
as a separate PCIe device.
I'll clarify the description.

>
> > > + details as described in Documentation/devicetree/bindings/pci/pci.txt
>
> Don't refer to pci.txt. It is going to be removed.
>

ack.

> > > +
> > > + '#iommu-cells':
> > > + const: 2
> > > + description: |
> >
> > |s are only needed where formatting needs to be preserved.
> >
> > > + Each IOMMU specifier represents the base device ID and number of
> > > + device IDs.
>
> Doesn't that assume device IDs are contiguous? Generally not a safe assumption.
>

ack.

> > > +
> > > + interrupts:
> > > + minItems: 1
> > > + maxItems: 16
> >
> > What are any of these interrupts?
> >
> > > + description:
> > > + The presence of this property implies that given RISC-V IOMMU uses
> > > + wired interrupts to notify the RISC-V HARTS (or CPUs).
> > > +
> > > + msi-parent:
> > > + description:
> > > + The presence of this property implies that given RISC-V IOMMU uses
> > > + MSIx to notify the RISC-V HARTs (or CPUs). This property should be
> > > + considered only when the interrupts property is absent.
>
> This doesn't make sense for a PCI device. PCI defines its own way to
> describe MSI support.
>

Agree, this is for IOMMU as a non-PCI device, capable of sending MSI.
Follows 'MSI clients' notes from
devicetree/bindings/interrupt-controller/msi.txt
Is this a proper way to describe this relationship?

> > > +
> > > + dma-coherent:
> >
> > RISC-V is dma-coherent by default, should this not be dma-noncoherent
> > instead?
> >
> > > + description:
> > > + Present if page table walks and DMA accessed made by the RISC-V IOMMU
> > > + are cache coherent with the CPU.
> > > +
> > > + power-domains:
> > > + maxItems: 1
> > > +
> > > +required:
> > > + - compatible
> > > + - reg
> > > + - '#iommu-cells'
> > > +
> > > +additionalProperties: false
> > > +
> > > +examples:
> > > + - |
> > > + /* Example 1 (IOMMU platform device with wired interrupts) */
> > > + immu1: iommu@1bccd000 {
> >
> > Why is this "immu"? typo or intentional?
> >
> > > + compatible = "vendor,chip-iommu", "riscv,iommu";
> > > + reg = <0x1bccd000 0x1000>;
> > > + interrupt-parent = <&aplic_smode>;
> > > + interrupts = <32 4>, <33 4>, <34 4>, <35 4>;
> > > + #iommu-cells = <2>;
> > > + };
> > > +
> > > + /* Device with two IOMMU device IDs, 0 and 7 */
> > > + master1 {
> > > + iommus = <&immu1 0 1>, <&immu1 7 1>;
> > > + };
> > > +
> > > + - |
> > > + /* Example 2 (IOMMU platform device with MSIs) */
> > > + immu2: iommu@1bcdd000 {
> > > + compatible = "vendor,chip-iommu", "riscv,iommu";
> > > + reg = <0x1bccd000 0x1000>;
> > > + msi-parent = <&imsics_smode>;
> > > + #iommu-cells = <2>;
> > > + };
> > > +
> > > + bus {
> > > + #address-cells = <2>;
> > > + #size-cells = <2>;
> > > +
> > > + /* Device with IOMMU device IDs ranging from 32 to 64 */
> > > + master1 {
> > > + iommus = <&immu2 32 32>;
> > > + };
> > > +
> > > + pcie@40000000 {
> > > + compatible = "pci-host-cam-generic";
> > > + device_type = "pci";
> > > + #address-cells = <3>;
> > > + #size-cells = <2>;
> > > + bus-range = <0x0 0x1>;
> > > +
> > > + /* CPU_PHYSICAL(2) SIZE(2) */
>
> I'm guessing there was more after this, but I don't have it...

Complete patch 3 is at:
https://lore.kernel.org/linux-iommu/[email protected]/T/#mbf8dc4098fb09b87b2618c5c545ae882f11b114b

>
> Guessing, immu2 is a PCI device, but it translates for master1 which
> is not a PCI device? Weird. Why would anyone build such a thing?
>

In this example immu2 is a non-PCI device. Agree, otherwise would be weird.

>
> Rob

Thank you,
- Tomasz

2023-07-20 04:20:14

by Nick Kossifidis

[permalink] [raw]
Subject: Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues

Hello Tomasz,

On 7/19/23 22:33, Tomasz Jeznach wrote:
> Enables message or wire signal interrupts for PCIe and platforms devices.
>

The description doesn't match the subject nor the patch content (we
don't jus enable interrupts, we also init the queues).

> + /* Parse Queue lengts */
> + ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> + if (!ret)
> + dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> +
> + ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> + if (!ret)
> + dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> +
> + ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> + if (!ret)
> + dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
> +
> dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
>

We need to add those to the device tree binding doc (or throw them away,
I thought it would be better to have them as part of the device
desciption than a module parameter).


> +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data);
> +static irqreturn_t riscv_iommu_priq_process(int irq, void *data);
> +

> + case RISCV_IOMMU_PAGE_REQUEST_QUEUE:
> + q = &iommu->priq;
> + q->len = sizeof(struct riscv_iommu_pq_record);
> + count = iommu->priq_len;
> + irq = iommu->irq_priq;
> + irq_check = riscv_iommu_priq_irq_check;
> + irq_process = riscv_iommu_priq_process;
> + q->qbr = RISCV_IOMMU_REG_PQB;
> + q->qcr = RISCV_IOMMU_REG_PQCSR;
> + name = "priq";
> + break;


It makes more sense to add the code for the page request queue in the
patch that adds ATS/PRI support IMHO. This comment also applies to its
interrupt handlers below.


> +static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
> + u64 addr)
> +{
> + cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
> + cmd->dword1 = addr;
> +}
> +

This needs to be (addr >> 2) to match the spec, same as in the iofence
command.

Regards,
Nick


2023-07-20 07:15:09

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface

On 19/07/2023 21:33, Tomasz Jeznach wrote:
> Enable sysfs debug / visibility interface providing restricted
> access to hardware registers.

Please use subject prefixes matching the subsystem. You can get them for
example with `git log --oneline -- DIRECTORY_OR_FILE` on the directory
your patch is touching.

>
> Signed-off-by: Tomasz Jeznach <[email protected]>
> ---
> drivers/iommu/riscv/Makefile | 2 +-
> drivers/iommu/riscv/iommu-sysfs.c | 183 ++++++++++++++++++++++++++++++
> drivers/iommu/riscv/iommu.c | 7 ++
> drivers/iommu/riscv/iommu.h | 2 +
> 4 files changed, 193 insertions(+), 1 deletion(-)
> create mode 100644 drivers/iommu/riscv/iommu-sysfs.c
>
> diff --git a/drivers/iommu/riscv/Makefile b/drivers/iommu/riscv/Makefile
> index 38730c11e4a8..9523eb053cfc 100644
> --- a/drivers/iommu/riscv/Makefile
> +++ b/drivers/iommu/riscv/Makefile
> @@ -1 +1 @@
> -obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o
> \ No newline at end of file
> +obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o iommu-sysfs.o
> \ No newline at end of file

You have this error in multiple places.

> diff --git a/drivers/iommu/riscv/iommu-sysfs.c b/drivers/iommu/riscv/iommu-sysfs.c
> new file mode 100644
> index 000000000000..f038ea8445c5
> --- /dev/null
> +++ b/drivers/iommu/riscv/iommu-sysfs.c
> @@ -0,0 +1,183 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * IOMMU API for RISC-V architected Ziommu implementations.
> + *
> + * Copyright © 2022-2023 Rivos Inc.
> + *
> + * Author: Tomasz Jeznach <[email protected]>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/compiler.h>
> +#include <linux/iommu.h>
> +#include <linux/platform_device.h>
> +#include <asm/page.h>
> +
> +#include "iommu.h"
> +
> +#define sysfs_dev_to_iommu(dev) \
> + container_of(dev_get_drvdata(dev), struct riscv_iommu_device, iommu)
> +
> +static ssize_t address_show(struct device *dev,
> + struct device_attribute *attr, char *buf)


Where is the sysfs ABI documented?


Best regards,
Krzysztof


2023-07-20 13:27:10

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues

On 2023/7/20 3:33, Tomasz Jeznach wrote:
> Enables message or wire signal interrupts for PCIe and platforms devices.

If this patch could be divided into multiple small patches, each
logically doing one specific thing, it will help people better review
the code.

Best regards,
baolu

2023-07-20 13:27:16

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface

On 2023/7/20 3:33, Tomasz Jeznach wrote:
> +#define sysfs_dev_to_iommu(dev) \
> + container_of(dev_get_drvdata(dev), struct riscv_iommu_device, iommu)
> +
> +static ssize_t address_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev);
> + return sprintf(buf, "%llx\n", iommu->reg_phys);

Use sysfs_emit() please.

> +}
> +
> +static DEVICE_ATTR_RO(address);
> +
> +#define ATTR_RD_REG32(name, offset) \
> + ssize_t reg_ ## name ## _show(struct device *dev, \
> + struct device_attribute *attr, char *buf) \
> +{ \
> + struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev); \
> + return sprintf(buf, "0x%x\n", \
> + riscv_iommu_readl(iommu, offset)); \
> +}
> +
> +#define ATTR_RD_REG64(name, offset) \
> + ssize_t reg_ ## name ## _show(struct device *dev, \
> + struct device_attribute *attr, char *buf) \
> +{ \
> + struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev); \
> + return sprintf(buf, "0x%llx\n", \
> + riscv_iommu_readq(iommu, offset)); \
> +}
> +
> +#define ATTR_WR_REG32(name, offset) \
> + ssize_t reg_ ## name ## _store(struct device *dev, \
> + struct device_attribute *attr, \
> + const char *buf, size_t len) \
> +{ \
> + struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev); \
> + unsigned long val; \
> + int ret; \
> + ret = kstrtoul(buf, 0, &val); \
> + if (ret) \
> + return ret; \
> + riscv_iommu_writel(iommu, offset, val); \
> + return len; \
> +}
> +
> +#define ATTR_WR_REG64(name, offset) \
> + ssize_t reg_ ## name ## _store(struct device *dev, \
> + struct device_attribute *attr, \
> + const char *buf, size_t len) \
> +{ \
> + struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev); \
> + unsigned long long val; \
> + int ret; \
> + ret = kstrtoull(buf, 0, &val); \
> + if (ret) \
> + return ret; \
> + riscv_iommu_writeq(iommu, offset, val); \
> + return len; \
> +}

So this allows users to change the registers through sysfs? How does
it synchronize with the iommu driver?

Best regards,
baolu

2023-07-20 17:56:49

by Tomasz Jeznach

[permalink] [raw]
Subject: Re: [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface

On Thu, Jul 20, 2023 at 5:51 AM Baolu Lu <[email protected]> wrote:
>
> On 2023/7/20 3:33, Tomasz Jeznach wrote:
> > +#define sysfs_dev_to_iommu(dev) \
> > + container_of(dev_get_drvdata(dev), struct riscv_iommu_device, iommu)
> > +
> > +static ssize_t address_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev);
> > + return sprintf(buf, "%llx\n", iommu->reg_phys);
>
> Use sysfs_emit() please.
>

ack. Thanks, will update.

> > +}
> > +
> > +static DEVICE_ATTR_RO(address);
> > +
> > +#define ATTR_RD_REG32(name, offset) \
> > + ssize_t reg_ ## name ## _show(struct device *dev, \
> > + struct device_attribute *attr, char *buf) \
> > +{ \
> > + struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev); \
> > + return sprintf(buf, "0x%x\n", \
> > + riscv_iommu_readl(iommu, offset)); \
> > +}
> > +
> > +#define ATTR_RD_REG64(name, offset) \
> > + ssize_t reg_ ## name ## _show(struct device *dev, \
> > + struct device_attribute *attr, char *buf) \
> > +{ \
> > + struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev); \
> > + return sprintf(buf, "0x%llx\n", \
> > + riscv_iommu_readq(iommu, offset)); \
> > +}
> > +
> > +#define ATTR_WR_REG32(name, offset) \
> > + ssize_t reg_ ## name ## _store(struct device *dev, \
> > + struct device_attribute *attr, \
> > + const char *buf, size_t len) \
> > +{ \
> > + struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev); \
> > + unsigned long val; \
> > + int ret; \
> > + ret = kstrtoul(buf, 0, &val); \
> > + if (ret) \
> > + return ret; \
> > + riscv_iommu_writel(iommu, offset, val); \
> > + return len; \
> > +}
> > +
> > +#define ATTR_WR_REG64(name, offset) \
> > + ssize_t reg_ ## name ## _store(struct device *dev, \
> > + struct device_attribute *attr, \
> > + const char *buf, size_t len) \
> > +{ \
> > + struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev); \
> > + unsigned long long val; \
> > + int ret; \
> > + ret = kstrtoull(buf, 0, &val); \
> > + if (ret) \
> > + return ret; \
> > + riscv_iommu_writeq(iommu, offset, val); \
> > + return len; \
> > +}
>
> So this allows users to change the registers through sysfs? How does
> it synchronize with the iommu driver?
>

The only writable registers are for debug interface and performance
monitoring counters, without any synchronization requirements between
user / driver. In follow up patch series performance counters will be
also removed from sysfs, replaced by integration with perfmon
subsystem. The only remaining will be a debug access, providing user
access to address translation, in short it provides an interface to
query SPA based on provided IOVA/RID/PASID. There was a discussion in
RVI IOMMU TG forum if it's acceptable to expose such an interface to
the privileged user, and the conclusion was that it's very likely not
exposing more info than privileged users already are able to acquire
by looking at in-memory data structures.

Read-only registers are to provide debug access to track queue
head/tail pointers and interrupt states.

> Best regards,
> baolu

regards,
- Tomasz

2023-07-20 18:14:14

by Tomasz Jeznach

[permalink] [raw]
Subject: Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues

On Thu, Jul 20, 2023 at 6:18 AM Baolu Lu <[email protected]> wrote:
>
> On 2023/7/20 3:33, Tomasz Jeznach wrote:
> > Enables message or wire signal interrupts for PCIe and platforms devices.
>
> If this patch could be divided into multiple small patches, each
> logically doing one specific thing, it will help people better review
> the code.
>

ack. I've got a similar comment regarding this patch already.
I will split and add more notes to the commit message. Thanks.


> Best regards,
> baolu
>


regards,
- Tomasz

2023-07-20 18:35:55

by Tomasz Jeznach

[permalink] [raw]
Subject: Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues

On Wed, Jul 19, 2023 at 8:12 PM Nick Kossifidis <[email protected]> wrote:
>
> Hello Tomasz,
>
> On 7/19/23 22:33, Tomasz Jeznach wrote:
> > Enables message or wire signal interrupts for PCIe and platforms devices.
> >
>
> The description doesn't match the subject nor the patch content (we
> don't jus enable interrupts, we also init the queues).
>
> > + /* Parse Queue lengts */
> > + ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> > + if (!ret)
> > + dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> > +
> > + ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> > + if (!ret)
> > + dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> > +
> > + ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> > + if (!ret)
> > + dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
> > +
> > dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> >
>
> We need to add those to the device tree binding doc (or throw them away,
> I thought it would be better to have them as part of the device
> desciption than a module parameter).
>

We can add them as an optional fields to DT.
Alternatively, I've been looking into an option to auto-scale CQ/PQ
based on number of attached devices, but this gets trickier for
hot-pluggable systems. I've added module parameters as a bare-minimum,
but still looking for better solutions.

>
> > +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data);
> > +static irqreturn_t riscv_iommu_priq_process(int irq, void *data);
> > +
>
> > + case RISCV_IOMMU_PAGE_REQUEST_QUEUE:
> > + q = &iommu->priq;
> > + q->len = sizeof(struct riscv_iommu_pq_record);
> > + count = iommu->priq_len;
> > + irq = iommu->irq_priq;
> > + irq_check = riscv_iommu_priq_irq_check;
> > + irq_process = riscv_iommu_priq_process;
> > + q->qbr = RISCV_IOMMU_REG_PQB;
> > + q->qcr = RISCV_IOMMU_REG_PQCSR;
> > + name = "priq";
> > + break;
>
>
> It makes more sense to add the code for the page request queue in the
> patch that adds ATS/PRI support IMHO. This comment also applies to its
> interrupt handlers below.
>

ack. will do.

>
> > +static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
> > + u64 addr)
> > +{
> > + cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
> > + cmd->dword1 = addr;
> > +}
> > +
>
> This needs to be (addr >> 2) to match the spec, same as in the iofence
> command.
>

oops. Thanks!

> Regards,
> Nick
>

regards,
- Tomasz

2023-07-20 19:03:46

by Tomasz Jeznach

[permalink] [raw]
Subject: Re: [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface

On Wed, Jul 19, 2023 at 11:38 PM Krzysztof Kozlowski <[email protected]> wrote:
>
> On 19/07/2023 21:33, Tomasz Jeznach wrote:
> > Enable sysfs debug / visibility interface providing restricted
> > access to hardware registers.
>
> Please use subject prefixes matching the subsystem. You can get them for
> example with `git log --oneline -- DIRECTORY_OR_FILE` on the directory
> your patch is touching.
>

ack.

> >
> > Signed-off-by: Tomasz Jeznach <[email protected]>
> > ---
> > drivers/iommu/riscv/Makefile | 2 +-
> > drivers/iommu/riscv/iommu-sysfs.c | 183 ++++++++++++++++++++++++++++++
> > drivers/iommu/riscv/iommu.c | 7 ++
> > drivers/iommu/riscv/iommu.h | 2 +
> > 4 files changed, 193 insertions(+), 1 deletion(-)
> > create mode 100644 drivers/iommu/riscv/iommu-sysfs.c
> >
> > diff --git a/drivers/iommu/riscv/Makefile b/drivers/iommu/riscv/Makefile
> > index 38730c11e4a8..9523eb053cfc 100644
> > --- a/drivers/iommu/riscv/Makefile
> > +++ b/drivers/iommu/riscv/Makefile
> > @@ -1 +1 @@
> > -obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o
> > \ No newline at end of file
> > +obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o iommu-sysfs.o
> > \ No newline at end of file
>
> You have this error in multiple places.
>

ack. next version will run through checkpatch.pl, should spot such problems.

> > diff --git a/drivers/iommu/riscv/iommu-sysfs.c b/drivers/iommu/riscv/iommu-sysfs.c
> > new file mode 100644
> > index 000000000000..f038ea8445c5
> > --- /dev/null
> > +++ b/drivers/iommu/riscv/iommu-sysfs.c
> > @@ -0,0 +1,183 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * IOMMU API for RISC-V architected Ziommu implementations.
> > + *
> > + * Copyright © 2022-2023 Rivos Inc.
> > + *
> > + * Author: Tomasz Jeznach <[email protected]>
> > + */
> > +
> > +#include <linux/module.h>
> > +#include <linux/kernel.h>
> > +#include <linux/compiler.h>
> > +#include <linux/iommu.h>
> > +#include <linux/platform_device.h>
> > +#include <asm/page.h>
> > +
> > +#include "iommu.h"
> > +
> > +#define sysfs_dev_to_iommu(dev) \
> > + container_of(dev_get_drvdata(dev), struct riscv_iommu_device, iommu)
> > +
> > +static ssize_t address_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
>
>
> Where is the sysfs ABI documented?
>

Sysfs for now is used only to expose selected IOMMU memory mapped
registers, with complete documentation in the RISC-V IOMMU Arch Spec
[1], and some comments in iommu-bits.h file.
LMK If it would be better to put a dedicated file documenting those
with the patch itself.


[1] https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf

>
> Best regards,
> Krzysztof
>

regards,
- Tomasz

2023-07-20 19:29:13

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues

On Thu, Jul 20, 2023 at 11:00:10AM -0700, Tomasz Jeznach wrote:
> On Wed, Jul 19, 2023 at 8:12 PM Nick Kossifidis <[email protected]> wrote:
> > On 7/19/23 22:33, Tomasz Jeznach wrote:
> > The description doesn't match the subject nor the patch content (we
> > don't jus enable interrupts, we also init the queues).
> >
> > > + /* Parse Queue lengts */
> > > + ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> > > + if (!ret)
> > > + dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> > > +
> > > + ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> > > + if (!ret)
> > > + dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> > > +
> > > + ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> > > + if (!ret)
> > > + dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
> > > +
> > > dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> > >
> >
> > We need to add those to the device tree binding doc (or throw them away,
> > I thought it would be better to have them as part of the device
> > desciption than a module parameter).

Aye, I didn't notice these. Any DT properties /must/ be documented.
To avoid having to make the comments on v2, properties should also not
contain underscores.

> We can add them as an optional fields to DT.
> Alternatively, I've been looking into an option to auto-scale CQ/PQ
> based on number of attached devices, but this gets trickier for
> hot-pluggable systems. I've added module parameters as a bare-minimum,
> but still looking for better solutions.

If they're properties of the hardware, they should come from DT/ACPI,
unless they're auto-detectable, in which case that is preferred.
To quote GregKH "please do not add new module parameters for drivers,
this is not the 1990s" :)


Attachments:
(No filename) (1.94 kB)
signature.asc (235.00 B)
Download all attachments

2023-07-20 21:51:46

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface

On 20/07/2023 20:30, Tomasz Jeznach wrote:
u.h"
>>> +
>>> +#define sysfs_dev_to_iommu(dev) \
>>> + container_of(dev_get_drvdata(dev), struct riscv_iommu_device, iommu)
>>> +
>>> +static ssize_t address_show(struct device *dev,
>>> + struct device_attribute *attr, char *buf)
>>
>>
>> Where is the sysfs ABI documented?
>>
>
> Sysfs for now is used only to expose selected IOMMU memory mapped
> registers, with complete documentation in the RISC-V IOMMU Arch Spec
> [1], and some comments in iommu-bits.h file.
> LMK If it would be better to put a dedicated file documenting those
> with the patch itself.

I meant, you created new sysfs interface. Maybe I missed something in
the patchset, but each new sysfs interface required documenting in
Documentation/ABI/.

Best regards,
Krzysztof


2023-07-20 22:31:26

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface

On Thu, Jul 20, 2023 at 11:37:50PM +0200, Krzysztof Kozlowski wrote:
> On 20/07/2023 20:30, Tomasz Jeznach wrote:

> >>> +#define sysfs_dev_to_iommu(dev) \
> >>> + container_of(dev_get_drvdata(dev), struct riscv_iommu_device, iommu)
> >>> +
> >>> +static ssize_t address_show(struct device *dev,
> >>> + struct device_attribute *attr, char *buf)
> >>
> >>
> >> Where is the sysfs ABI documented?
> >>
> >
> > Sysfs for now is used only to expose selected IOMMU memory mapped
> > registers, with complete documentation in the RISC-V IOMMU Arch Spec
> > [1], and some comments in iommu-bits.h file.
> > LMK If it would be better to put a dedicated file documenting those
> > with the patch itself.
>
> I meant, you created new sysfs interface. Maybe I missed something in
> the patchset, but each new sysfs interface required documenting in
> Documentation/ABI/.

| expose selected IOMMU memory mapped registers

| Enable sysfs debug / visibility interface providing restricted
| access to hardware registers.

Documentation requirements of sysfs stuff aside, I'm not sure that we
even want a sysfs interface for this in the first place? Seems like, if
at all, this should be debugfs instead? Seems like the only use case for
it is debugging/development...


Attachments:
(No filename) (1.29 kB)
signature.asc (235.00 B)
Download all attachments

2023-07-21 04:48:49

by Tomasz Jeznach

[permalink] [raw]
Subject: Re: [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface

On Thu, Jul 20, 2023 at 3:08 PM Conor Dooley <[email protected]> wrote:
>
> On Thu, Jul 20, 2023 at 11:37:50PM +0200, Krzysztof Kozlowski wrote:
> > On 20/07/2023 20:30, Tomasz Jeznach wrote:
>
> > >>> +#define sysfs_dev_to_iommu(dev) \
> > >>> + container_of(dev_get_drvdata(dev), struct riscv_iommu_device, iommu)
> > >>> +
> > >>> +static ssize_t address_show(struct device *dev,
> > >>> + struct device_attribute *attr, char *buf)
> > >>
> > >>
> > >> Where is the sysfs ABI documented?
> > >>
> > >
> > > Sysfs for now is used only to expose selected IOMMU memory mapped
> > > registers, with complete documentation in the RISC-V IOMMU Arch Spec
> > > [1], and some comments in iommu-bits.h file.
> > > LMK If it would be better to put a dedicated file documenting those
> > > with the patch itself.
> >
> > I meant, you created new sysfs interface. Maybe I missed something in
> > the patchset, but each new sysfs interface required documenting in
> > Documentation/ABI/.
>
> | expose selected IOMMU memory mapped registers
>
> | Enable sysfs debug / visibility interface providing restricted
> | access to hardware registers.
>
> Documentation requirements of sysfs stuff aside, I'm not sure that we
> even want a sysfs interface for this in the first place? Seems like, if
> at all, this should be debugfs instead? Seems like the only use case for
> it is debugging/development...

Thanks Conor, will switch to debugfs. This will be a more suitable interface.

regards,
- Tomasz

2023-07-24 08:38:54

by Zong Li

[permalink] [raw]
Subject: Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings

On Thu, Jul 20, 2023 at 3:35 AM Tomasz Jeznach <[email protected]> wrote:
>
> From: Anup Patel <[email protected]>
>
> We add DT bindings document for RISC-V IOMMU platform and PCI devices
> defined by the RISC-V IOMMU specification.
>
> Signed-off-by: Anup Patel <[email protected]>
> ---
> .../bindings/iommu/riscv,iommu.yaml | 146 ++++++++++++++++++
> 1 file changed, 146 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
>
> diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> new file mode 100644
> index 000000000000..8a9aedb61768
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> @@ -0,0 +1,146 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: RISC-V IOMMU Implementation
> +
> +maintainers:
> + - Tomasz Jeznach <[email protected]>
> +
> +description:
> + The RISC-V IOMMU specificaiton defines an IOMMU for RISC-V platforms
> + which can be a regular platform device or a PCI device connected to
> + the host root port.
> +
> + The RISC-V IOMMU provides two stage translation, device directory table,
> + command queue and fault reporting as wired interrupt or MSIx event for
> + both PCI and platform devices.
> +
> + Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
> +
> +properties:
> + compatible:
> + oneOf:
> + - description: RISC-V IOMMU as a platform device
> + items:
> + - enum:
> + - vendor,chip-iommu
> + - const: riscv,iommu
> +
> + - description: RISC-V IOMMU as a PCI device connected to root port
> + items:
> + - enum:
> + - vendor,chip-pci-iommu
> + - const: riscv,pci-iommu
> +
> + reg:
> + maxItems: 1
> + description:
> + For RISC-V IOMMU as a platform device, this represents the MMIO base
> + address of registers.
> +
> + For RISC-V IOMMU as a PCI device, this represents the PCI-PCI bridge
> + details as described in Documentation/devicetree/bindings/pci/pci.txt
> +
> + '#iommu-cells':
> + const: 2
> + description: |
> + Each IOMMU specifier represents the base device ID and number of
> + device IDs.
> +
> + interrupts:
> + minItems: 1
> + maxItems: 16
> + description:
> + The presence of this property implies that given RISC-V IOMMU uses
> + wired interrupts to notify the RISC-V HARTS (or CPUs).
> +
> + msi-parent:
> + description:
> + The presence of this property implies that given RISC-V IOMMU uses
> + MSIx to notify the RISC-V HARTs (or CPUs). This property should be
> + considered only when the interrupts property is absent.
> +
> + dma-coherent:
> + description:
> + Present if page table walks and DMA accessed made by the RISC-V IOMMU
> + are cache coherent with the CPU.
> +
> + power-domains:
> + maxItems: 1
> +

In RISC-V IOMMU, certain devices can be set to bypass mode when the
IOMMU is in translation mode. To identify the devices that require
bypass mode by default, does it be sensible to add a property to
indicate this behavior?

> +required:
> + - compatible
> + - reg
> + - '#iommu-cells'
> +
> +additionalProperties: false
> +
> +examples:
> + - |
> + /* Example 1 (IOMMU platform device with wired interrupts) */
> + immu1: iommu@1bccd000 {
> + compatible = "vendor,chip-iommu", "riscv,iommu";
> + reg = <0x1bccd000 0x1000>;
> + interrupt-parent = <&aplic_smode>;
> + interrupts = <32 4>, <33 4>, <34 4>, <35 4>;
> + #iommu-cells = <2>;
> + };
> +
> + /* Device with two IOMMU device IDs, 0 and 7 */
> + master1 {
> + iommus = <&immu1 0 1>, <&immu1 7 1>;
> + };
> +
> + - |
> + /* Example 2 (IOMMU platform device with MSIs) */
> + immu2: iommu@1bcdd000 {
> + compatible = "vendor,chip-iommu", "riscv,iommu";
> + reg = <0x1bccd000 0x1000>;
> + msi-parent = <&imsics_smode>;
> + #iommu-cells = <2>;
> + };
> +
> + bus {
> + #address-cells = <2>;
> + #size-cells = <2>;
> +
> + /* Device with IOMMU device IDs ranging from 32 to 64 */
> + master1 {
> + iommus = <&immu2 32 32>;
> + };
> +
> + pcie@40000000 {
> + compatible = "pci-host-cam-generic";
> + device_type = "pci";
> + #address-cells = <3>;
> + #size-cells = <2>;
> + bus-range = <0x0 0x1>;
> +
> + /* CPU_PHYSICAL(2) SIZE(2) */
> + reg = <0x0 0x40000000 0x0 0x1000000>;
> +
> + /* BUS_ADDRESS(3) CPU_PHYSICAL(2) SIZE(2) */
> + ranges = <0x01000000 0x0 0x01000000 0x0 0x01000000 0x0 0x00010000>,
> + <0x02000000 0x0 0x41000000 0x0 0x41000000 0x0 0x3f000000>;
> +
> + #interrupt-cells = <0x1>;
> +
> + /* PCI_DEVICE(3) INT#(1) CONTROLLER(PHANDLE) CONTROLLER_DATA(2) */
> + interrupt-map = < 0x0 0x0 0x0 0x1 &aplic_smode 0x4 0x1>,
> + < 0x800 0x0 0x0 0x1 &aplic_smode 0x5 0x1>,
> + <0x1000 0x0 0x0 0x1 &aplic_smode 0x6 0x1>,
> + <0x1800 0x0 0x0 0x1 &aplic_smode 0x7 0x1>;
> +
> + /* PCI_DEVICE(3) INT#(1) */
> + interrupt-map-mask = <0xf800 0x0 0x0 0x7>;
> +
> + msi-parent = <&imsics_smode>;
> +
> + /* Devices with bus number 0-127 are mastered via immu2 */
> + iommu-map = <0x0000 &immu2 0x0000 0x8000>;
> + };
> + };
> +...
> --
> 2.34.1
>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-07-24 10:24:46

by Zong Li

[permalink] [raw]
Subject: Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues

On Fri, Jul 21, 2023 at 2:00 AM Tomasz Jeznach <[email protected]> wrote:
>
> On Wed, Jul 19, 2023 at 8:12 PM Nick Kossifidis <[email protected]> wrote:
> >
> > Hello Tomasz,
> >
> > On 7/19/23 22:33, Tomasz Jeznach wrote:
> > > Enables message or wire signal interrupts for PCIe and platforms devices.
> > >
> >
> > The description doesn't match the subject nor the patch content (we
> > don't jus enable interrupts, we also init the queues).
> >
> > > + /* Parse Queue lengts */
> > > + ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> > > + if (!ret)
> > > + dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> > > +
> > > + ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> > > + if (!ret)
> > > + dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> > > +
> > > + ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> > > + if (!ret)
> > > + dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
> > > +
> > > dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> > >
> >
> > We need to add those to the device tree binding doc (or throw them away,
> > I thought it would be better to have them as part of the device
> > desciption than a module parameter).
> >
>
> We can add them as an optional fields to DT.
> Alternatively, I've been looking into an option to auto-scale CQ/PQ
> based on number of attached devices, but this gets trickier for
> hot-pluggable systems. I've added module parameters as a bare-minimum,
> but still looking for better solutions.
>
> >
> > > +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data);
> > > +static irqreturn_t riscv_iommu_priq_process(int irq, void *data);
> > > +
> >
> > > + case RISCV_IOMMU_PAGE_REQUEST_QUEUE:
> > > + q = &iommu->priq;
> > > + q->len = sizeof(struct riscv_iommu_pq_record);
> > > + count = iommu->priq_len;
> > > + irq = iommu->irq_priq;
> > > + irq_check = riscv_iommu_priq_irq_check;
> > > + irq_process = riscv_iommu_priq_process;
> > > + q->qbr = RISCV_IOMMU_REG_PQB;
> > > + q->qcr = RISCV_IOMMU_REG_PQCSR;
> > > + name = "priq";
> > > + break;
> >
> >
> > It makes more sense to add the code for the page request queue in the
> > patch that adds ATS/PRI support IMHO. This comment also applies to its
> > interrupt handlers below.
> >
>
> ack. will do.
>
> >
> > > +static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
> > > + u64 addr)
> > > +{
> > > + cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
> > > + cmd->dword1 = addr;
> > > +}
> > > +
> >
> > This needs to be (addr >> 2) to match the spec, same as in the iofence
> > command.
> >
>
> oops. Thanks!
>

I think it should be (addr >> 12) according to the spec.

> > Regards,
> > Nick
> >
>
> regards,
> - Tomasz
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-07-24 10:31:44

by Anup Patel

[permalink] [raw]
Subject: Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings

On Mon, Jul 24, 2023 at 1:33 PM Zong Li <[email protected]> wrote:
>
> On Thu, Jul 20, 2023 at 3:35 AM Tomasz Jeznach <[email protected]> wrote:
> >
> > From: Anup Patel <[email protected]>
> >
> > We add DT bindings document for RISC-V IOMMU platform and PCI devices
> > defined by the RISC-V IOMMU specification.
> >
> > Signed-off-by: Anup Patel <[email protected]>
> > ---
> > .../bindings/iommu/riscv,iommu.yaml | 146 ++++++++++++++++++
> > 1 file changed, 146 insertions(+)
> > create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> >
> > diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > new file mode 100644
> > index 000000000000..8a9aedb61768
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > @@ -0,0 +1,146 @@
> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: RISC-V IOMMU Implementation
> > +
> > +maintainers:
> > + - Tomasz Jeznach <[email protected]>
> > +
> > +description:
> > + The RISC-V IOMMU specificaiton defines an IOMMU for RISC-V platforms
> > + which can be a regular platform device or a PCI device connected to
> > + the host root port.
> > +
> > + The RISC-V IOMMU provides two stage translation, device directory table,
> > + command queue and fault reporting as wired interrupt or MSIx event for
> > + both PCI and platform devices.
> > +
> > + Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
> > +
> > +properties:
> > + compatible:
> > + oneOf:
> > + - description: RISC-V IOMMU as a platform device
> > + items:
> > + - enum:
> > + - vendor,chip-iommu
> > + - const: riscv,iommu
> > +
> > + - description: RISC-V IOMMU as a PCI device connected to root port
> > + items:
> > + - enum:
> > + - vendor,chip-pci-iommu
> > + - const: riscv,pci-iommu
> > +
> > + reg:
> > + maxItems: 1
> > + description:
> > + For RISC-V IOMMU as a platform device, this represents the MMIO base
> > + address of registers.
> > +
> > + For RISC-V IOMMU as a PCI device, this represents the PCI-PCI bridge
> > + details as described in Documentation/devicetree/bindings/pci/pci.txt
> > +
> > + '#iommu-cells':
> > + const: 2
> > + description: |
> > + Each IOMMU specifier represents the base device ID and number of
> > + device IDs.
> > +
> > + interrupts:
> > + minItems: 1
> > + maxItems: 16
> > + description:
> > + The presence of this property implies that given RISC-V IOMMU uses
> > + wired interrupts to notify the RISC-V HARTS (or CPUs).
> > +
> > + msi-parent:
> > + description:
> > + The presence of this property implies that given RISC-V IOMMU uses
> > + MSIx to notify the RISC-V HARTs (or CPUs). This property should be
> > + considered only when the interrupts property is absent.
> > +
> > + dma-coherent:
> > + description:
> > + Present if page table walks and DMA accessed made by the RISC-V IOMMU
> > + are cache coherent with the CPU.
> > +
> > + power-domains:
> > + maxItems: 1
> > +
>
> In RISC-V IOMMU, certain devices can be set to bypass mode when the
> IOMMU is in translation mode. To identify the devices that require
> bypass mode by default, does it be sensible to add a property to
> indicate this behavior?

Bypass mode for a device is a property of that device (similar to dma-coherent)
and not of the IOMMU. Other architectures (ARM and x86) never added such
a device property for bypass mode so I guess it is NOT ADVISABLE to do it.

If this is REALLY required then we can do something similar to the QCOM
SMMU driver where they have a whitelist of devices which are allowed to
be in bypass mode (i.e. IOMMU_DOMAIN_IDENTITY) based their device
compatible string and any device outside this whitelist is blocked by default.

Regards,
Anup

>
> > +required:
> > + - compatible
> > + - reg
> > + - '#iommu-cells'
> > +
> > +additionalProperties: false
> > +
> > +examples:
> > + - |
> > + /* Example 1 (IOMMU platform device with wired interrupts) */
> > + immu1: iommu@1bccd000 {
> > + compatible = "vendor,chip-iommu", "riscv,iommu";
> > + reg = <0x1bccd000 0x1000>;
> > + interrupt-parent = <&aplic_smode>;
> > + interrupts = <32 4>, <33 4>, <34 4>, <35 4>;
> > + #iommu-cells = <2>;
> > + };
> > +
> > + /* Device with two IOMMU device IDs, 0 and 7 */
> > + master1 {
> > + iommus = <&immu1 0 1>, <&immu1 7 1>;
> > + };
> > +
> > + - |
> > + /* Example 2 (IOMMU platform device with MSIs) */
> > + immu2: iommu@1bcdd000 {
> > + compatible = "vendor,chip-iommu", "riscv,iommu";
> > + reg = <0x1bccd000 0x1000>;
> > + msi-parent = <&imsics_smode>;
> > + #iommu-cells = <2>;
> > + };
> > +
> > + bus {
> > + #address-cells = <2>;
> > + #size-cells = <2>;
> > +
> > + /* Device with IOMMU device IDs ranging from 32 to 64 */
> > + master1 {
> > + iommus = <&immu2 32 32>;
> > + };
> > +
> > + pcie@40000000 {
> > + compatible = "pci-host-cam-generic";
> > + device_type = "pci";
> > + #address-cells = <3>;
> > + #size-cells = <2>;
> > + bus-range = <0x0 0x1>;
> > +
> > + /* CPU_PHYSICAL(2) SIZE(2) */
> > + reg = <0x0 0x40000000 0x0 0x1000000>;
> > +
> > + /* BUS_ADDRESS(3) CPU_PHYSICAL(2) SIZE(2) */
> > + ranges = <0x01000000 0x0 0x01000000 0x0 0x01000000 0x0 0x00010000>,
> > + <0x02000000 0x0 0x41000000 0x0 0x41000000 0x0 0x3f000000>;
> > +
> > + #interrupt-cells = <0x1>;
> > +
> > + /* PCI_DEVICE(3) INT#(1) CONTROLLER(PHANDLE) CONTROLLER_DATA(2) */
> > + interrupt-map = < 0x0 0x0 0x0 0x1 &aplic_smode 0x4 0x1>,
> > + < 0x800 0x0 0x0 0x1 &aplic_smode 0x5 0x1>,
> > + <0x1000 0x0 0x0 0x1 &aplic_smode 0x6 0x1>,
> > + <0x1800 0x0 0x0 0x1 &aplic_smode 0x7 0x1>;
> > +
> > + /* PCI_DEVICE(3) INT#(1) */
> > + interrupt-map-mask = <0xf800 0x0 0x0 0x7>;
> > +
> > + msi-parent = <&imsics_smode>;
> > +
> > + /* Devices with bus number 0-127 are mastered via immu2 */
> > + iommu-map = <0x0000 &immu2 0x0000 0x8000>;
> > + };
> > + };
> > +...
> > --
> > 2.34.1
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-07-24 11:37:18

by Zong Li

[permalink] [raw]
Subject: Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings

On Mon, Jul 24, 2023 at 6:02 PM Anup Patel <[email protected]> wrote:
>
> On Mon, Jul 24, 2023 at 1:33 PM Zong Li <[email protected]> wrote:
> >
> > On Thu, Jul 20, 2023 at 3:35 AM Tomasz Jeznach <[email protected]> wrote:
> > >
> > > From: Anup Patel <[email protected]>
> > >
> > > We add DT bindings document for RISC-V IOMMU platform and PCI devices
> > > defined by the RISC-V IOMMU specification.
> > >
> > > Signed-off-by: Anup Patel <[email protected]>
> > > ---
> > > .../bindings/iommu/riscv,iommu.yaml | 146 ++++++++++++++++++
> > > 1 file changed, 146 insertions(+)
> > > create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > >
> > > diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > new file mode 100644
> > > index 000000000000..8a9aedb61768
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > @@ -0,0 +1,146 @@
> > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > +%YAML 1.2
> > > +---
> > > +$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
> > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > +
> > > +title: RISC-V IOMMU Implementation
> > > +
> > > +maintainers:
> > > + - Tomasz Jeznach <[email protected]>
> > > +
> > > +description:
> > > + The RISC-V IOMMU specificaiton defines an IOMMU for RISC-V platforms
> > > + which can be a regular platform device or a PCI device connected to
> > > + the host root port.
> > > +
> > > + The RISC-V IOMMU provides two stage translation, device directory table,
> > > + command queue and fault reporting as wired interrupt or MSIx event for
> > > + both PCI and platform devices.
> > > +
> > > + Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
> > > +
> > > +properties:
> > > + compatible:
> > > + oneOf:
> > > + - description: RISC-V IOMMU as a platform device
> > > + items:
> > > + - enum:
> > > + - vendor,chip-iommu
> > > + - const: riscv,iommu
> > > +
> > > + - description: RISC-V IOMMU as a PCI device connected to root port
> > > + items:
> > > + - enum:
> > > + - vendor,chip-pci-iommu
> > > + - const: riscv,pci-iommu
> > > +
> > > + reg:
> > > + maxItems: 1
> > > + description:
> > > + For RISC-V IOMMU as a platform device, this represents the MMIO base
> > > + address of registers.
> > > +
> > > + For RISC-V IOMMU as a PCI device, this represents the PCI-PCI bridge
> > > + details as described in Documentation/devicetree/bindings/pci/pci.txt
> > > +
> > > + '#iommu-cells':
> > > + const: 2
> > > + description: |
> > > + Each IOMMU specifier represents the base device ID and number of
> > > + device IDs.
> > > +
> > > + interrupts:
> > > + minItems: 1
> > > + maxItems: 16
> > > + description:
> > > + The presence of this property implies that given RISC-V IOMMU uses
> > > + wired interrupts to notify the RISC-V HARTS (or CPUs).
> > > +
> > > + msi-parent:
> > > + description:
> > > + The presence of this property implies that given RISC-V IOMMU uses
> > > + MSIx to notify the RISC-V HARTs (or CPUs). This property should be
> > > + considered only when the interrupts property is absent.
> > > +
> > > + dma-coherent:
> > > + description:
> > > + Present if page table walks and DMA accessed made by the RISC-V IOMMU
> > > + are cache coherent with the CPU.
> > > +
> > > + power-domains:
> > > + maxItems: 1
> > > +
> >
> > In RISC-V IOMMU, certain devices can be set to bypass mode when the
> > IOMMU is in translation mode. To identify the devices that require
> > bypass mode by default, does it be sensible to add a property to
> > indicate this behavior?
>
> Bypass mode for a device is a property of that device (similar to dma-coherent)
> and not of the IOMMU. Other architectures (ARM and x86) never added such
> a device property for bypass mode so I guess it is NOT ADVISABLE to do it.
>
> If this is REALLY required then we can do something similar to the QCOM
> SMMU driver where they have a whitelist of devices which are allowed to
> be in bypass mode (i.e. IOMMU_DOMAIN_IDENTITY) based their device
> compatible string and any device outside this whitelist is blocked by default.
>

I have considered that adding the property of bypass mode to that
device would be more appropriate. However, if we want to define this
property for the device, it might need to go through the generic IOMMU
dt-bindings, but I'm not sure if other IOMMU devices need this. I am
bringing up this topic here because I would like to explore if there
are any solutions on the IOMMU side, such as a property that indicates
the phandle of devices wishing to set bypass mode, somewhat similar to
the whitelist you mentioned earlier. Do you think we should address
this? After all, this is a case of RISC-V IOMMU supported.

> Regards,
> Anup
>
> >
> > > +required:
> > > + - compatible
> > > + - reg
> > > + - '#iommu-cells'
> > > +
> > > +additionalProperties: false
> > > +
> > > +examples:
> > > + - |
> > > + /* Example 1 (IOMMU platform device with wired interrupts) */
> > > + immu1: iommu@1bccd000 {
> > > + compatible = "vendor,chip-iommu", "riscv,iommu";
> > > + reg = <0x1bccd000 0x1000>;
> > > + interrupt-parent = <&aplic_smode>;
> > > + interrupts = <32 4>, <33 4>, <34 4>, <35 4>;
> > > + #iommu-cells = <2>;
> > > + };
> > > +
> > > + /* Device with two IOMMU device IDs, 0 and 7 */
> > > + master1 {
> > > + iommus = <&immu1 0 1>, <&immu1 7 1>;
> > > + };
> > > +
> > > + - |
> > > + /* Example 2 (IOMMU platform device with MSIs) */
> > > + immu2: iommu@1bcdd000 {
> > > + compatible = "vendor,chip-iommu", "riscv,iommu";
> > > + reg = <0x1bccd000 0x1000>;
> > > + msi-parent = <&imsics_smode>;
> > > + #iommu-cells = <2>;
> > > + };
> > > +
> > > + bus {
> > > + #address-cells = <2>;
> > > + #size-cells = <2>;
> > > +
> > > + /* Device with IOMMU device IDs ranging from 32 to 64 */
> > > + master1 {
> > > + iommus = <&immu2 32 32>;
> > > + };
> > > +
> > > + pcie@40000000 {
> > > + compatible = "pci-host-cam-generic";
> > > + device_type = "pci";
> > > + #address-cells = <3>;
> > > + #size-cells = <2>;
> > > + bus-range = <0x0 0x1>;
> > > +
> > > + /* CPU_PHYSICAL(2) SIZE(2) */
> > > + reg = <0x0 0x40000000 0x0 0x1000000>;
> > > +
> > > + /* BUS_ADDRESS(3) CPU_PHYSICAL(2) SIZE(2) */
> > > + ranges = <0x01000000 0x0 0x01000000 0x0 0x01000000 0x0 0x00010000>,
> > > + <0x02000000 0x0 0x41000000 0x0 0x41000000 0x0 0x3f000000>;
> > > +
> > > + #interrupt-cells = <0x1>;
> > > +
> > > + /* PCI_DEVICE(3) INT#(1) CONTROLLER(PHANDLE) CONTROLLER_DATA(2) */
> > > + interrupt-map = < 0x0 0x0 0x0 0x1 &aplic_smode 0x4 0x1>,
> > > + < 0x800 0x0 0x0 0x1 &aplic_smode 0x5 0x1>,
> > > + <0x1000 0x0 0x0 0x1 &aplic_smode 0x6 0x1>,
> > > + <0x1800 0x0 0x0 0x1 &aplic_smode 0x7 0x1>;
> > > +
> > > + /* PCI_DEVICE(3) INT#(1) */
> > > + interrupt-map-mask = <0xf800 0x0 0x0 0x7>;
> > > +
> > > + msi-parent = <&imsics_smode>;
> > > +
> > > + /* Devices with bus number 0-127 are mastered via immu2 */
> > > + iommu-map = <0x0000 &immu2 0x0000 0x8000>;
> > > + };
> > > + };
> > > +...
> > > --
> > > 2.34.1
> > >
> > >
> > > _______________________________________________
> > > linux-riscv mailing list
> > > [email protected]
> > > http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-07-24 12:42:55

by Anup Patel

[permalink] [raw]
Subject: Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings

On Mon, Jul 24, 2023 at 5:01 PM Zong Li <[email protected]> wrote:
>
> On Mon, Jul 24, 2023 at 6:02 PM Anup Patel <[email protected]> wrote:
> >
> > On Mon, Jul 24, 2023 at 1:33 PM Zong Li <[email protected]> wrote:
> > >
> > > On Thu, Jul 20, 2023 at 3:35 AM Tomasz Jeznach <[email protected]> wrote:
> > > >
> > > > From: Anup Patel <[email protected]>
> > > >
> > > > We add DT bindings document for RISC-V IOMMU platform and PCI devices
> > > > defined by the RISC-V IOMMU specification.
> > > >
> > > > Signed-off-by: Anup Patel <[email protected]>
> > > > ---
> > > > .../bindings/iommu/riscv,iommu.yaml | 146 ++++++++++++++++++
> > > > 1 file changed, 146 insertions(+)
> > > > create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > >
> > > > diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > > new file mode 100644
> > > > index 000000000000..8a9aedb61768
> > > > --- /dev/null
> > > > +++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > > @@ -0,0 +1,146 @@
> > > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > > +%YAML 1.2
> > > > +---
> > > > +$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
> > > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > > +
> > > > +title: RISC-V IOMMU Implementation
> > > > +
> > > > +maintainers:
> > > > + - Tomasz Jeznach <[email protected]>
> > > > +
> > > > +description:
> > > > + The RISC-V IOMMU specificaiton defines an IOMMU for RISC-V platforms
> > > > + which can be a regular platform device or a PCI device connected to
> > > > + the host root port.
> > > > +
> > > > + The RISC-V IOMMU provides two stage translation, device directory table,
> > > > + command queue and fault reporting as wired interrupt or MSIx event for
> > > > + both PCI and platform devices.
> > > > +
> > > > + Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
> > > > +
> > > > +properties:
> > > > + compatible:
> > > > + oneOf:
> > > > + - description: RISC-V IOMMU as a platform device
> > > > + items:
> > > > + - enum:
> > > > + - vendor,chip-iommu
> > > > + - const: riscv,iommu
> > > > +
> > > > + - description: RISC-V IOMMU as a PCI device connected to root port
> > > > + items:
> > > > + - enum:
> > > > + - vendor,chip-pci-iommu
> > > > + - const: riscv,pci-iommu
> > > > +
> > > > + reg:
> > > > + maxItems: 1
> > > > + description:
> > > > + For RISC-V IOMMU as a platform device, this represents the MMIO base
> > > > + address of registers.
> > > > +
> > > > + For RISC-V IOMMU as a PCI device, this represents the PCI-PCI bridge
> > > > + details as described in Documentation/devicetree/bindings/pci/pci.txt
> > > > +
> > > > + '#iommu-cells':
> > > > + const: 2
> > > > + description: |
> > > > + Each IOMMU specifier represents the base device ID and number of
> > > > + device IDs.
> > > > +
> > > > + interrupts:
> > > > + minItems: 1
> > > > + maxItems: 16
> > > > + description:
> > > > + The presence of this property implies that given RISC-V IOMMU uses
> > > > + wired interrupts to notify the RISC-V HARTS (or CPUs).
> > > > +
> > > > + msi-parent:
> > > > + description:
> > > > + The presence of this property implies that given RISC-V IOMMU uses
> > > > + MSIx to notify the RISC-V HARTs (or CPUs). This property should be
> > > > + considered only when the interrupts property is absent.
> > > > +
> > > > + dma-coherent:
> > > > + description:
> > > > + Present if page table walks and DMA accessed made by the RISC-V IOMMU
> > > > + are cache coherent with the CPU.
> > > > +
> > > > + power-domains:
> > > > + maxItems: 1
> > > > +
> > >
> > > In RISC-V IOMMU, certain devices can be set to bypass mode when the
> > > IOMMU is in translation mode. To identify the devices that require
> > > bypass mode by default, does it be sensible to add a property to
> > > indicate this behavior?
> >
> > Bypass mode for a device is a property of that device (similar to dma-coherent)
> > and not of the IOMMU. Other architectures (ARM and x86) never added such
> > a device property for bypass mode so I guess it is NOT ADVISABLE to do it.
> >
> > If this is REALLY required then we can do something similar to the QCOM
> > SMMU driver where they have a whitelist of devices which are allowed to
> > be in bypass mode (i.e. IOMMU_DOMAIN_IDENTITY) based their device
> > compatible string and any device outside this whitelist is blocked by default.
> >
>
> I have considered that adding the property of bypass mode to that
> device would be more appropriate. However, if we want to define this
> property for the device, it might need to go through the generic IOMMU
> dt-bindings, but I'm not sure if other IOMMU devices need this. I am
> bringing up this topic here because I would like to explore if there
> are any solutions on the IOMMU side, such as a property that indicates
> the phandle of devices wishing to set bypass mode, somewhat similar to
> the whitelist you mentioned earlier. Do you think we should address
> this? After all, this is a case of RISC-V IOMMU supported.

Bypass mode is a common feature across IOMMUs. Other IOMMUs don't
have a special property for bypass mode at device-level or at IOMMU level,
which clearly indicates that defining a RISC-V specific property is not the
right way to go.

The real question is how do we set IOMMU_DOMAIN_IDENTITY (i.e.
bypass/identity domain) as the default domain for certain devices ?

One possible option is to implement def_domain_type() IOMMU operation
for RISC-V IOMMU which will return IOMMU_DOMAIN_IDENTITY for
certain devices based on compatible string matching (i.e. whitelist of
devices). As an example, refer qcom_smmu_def_domain_type()
of drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c

Regards,
Anup





>
> > Regards,
> > Anup
> >
> > >
> > > > +required:
> > > > + - compatible
> > > > + - reg
> > > > + - '#iommu-cells'
> > > > +
> > > > +additionalProperties: false
> > > > +
> > > > +examples:
> > > > + - |
> > > > + /* Example 1 (IOMMU platform device with wired interrupts) */
> > > > + immu1: iommu@1bccd000 {
> > > > + compatible = "vendor,chip-iommu", "riscv,iommu";
> > > > + reg = <0x1bccd000 0x1000>;
> > > > + interrupt-parent = <&aplic_smode>;
> > > > + interrupts = <32 4>, <33 4>, <34 4>, <35 4>;
> > > > + #iommu-cells = <2>;
> > > > + };
> > > > +
> > > > + /* Device with two IOMMU device IDs, 0 and 7 */
> > > > + master1 {
> > > > + iommus = <&immu1 0 1>, <&immu1 7 1>;
> > > > + };
> > > > +
> > > > + - |
> > > > + /* Example 2 (IOMMU platform device with MSIs) */
> > > > + immu2: iommu@1bcdd000 {
> > > > + compatible = "vendor,chip-iommu", "riscv,iommu";
> > > > + reg = <0x1bccd000 0x1000>;
> > > > + msi-parent = <&imsics_smode>;
> > > > + #iommu-cells = <2>;
> > > > + };
> > > > +
> > > > + bus {
> > > > + #address-cells = <2>;
> > > > + #size-cells = <2>;
> > > > +
> > > > + /* Device with IOMMU device IDs ranging from 32 to 64 */
> > > > + master1 {
> > > > + iommus = <&immu2 32 32>;
> > > > + };
> > > > +
> > > > + pcie@40000000 {
> > > > + compatible = "pci-host-cam-generic";
> > > > + device_type = "pci";
> > > > + #address-cells = <3>;
> > > > + #size-cells = <2>;
> > > > + bus-range = <0x0 0x1>;
> > > > +
> > > > + /* CPU_PHYSICAL(2) SIZE(2) */
> > > > + reg = <0x0 0x40000000 0x0 0x1000000>;
> > > > +
> > > > + /* BUS_ADDRESS(3) CPU_PHYSICAL(2) SIZE(2) */
> > > > + ranges = <0x01000000 0x0 0x01000000 0x0 0x01000000 0x0 0x00010000>,
> > > > + <0x02000000 0x0 0x41000000 0x0 0x41000000 0x0 0x3f000000>;
> > > > +
> > > > + #interrupt-cells = <0x1>;
> > > > +
> > > > + /* PCI_DEVICE(3) INT#(1) CONTROLLER(PHANDLE) CONTROLLER_DATA(2) */
> > > > + interrupt-map = < 0x0 0x0 0x0 0x1 &aplic_smode 0x4 0x1>,
> > > > + < 0x800 0x0 0x0 0x1 &aplic_smode 0x5 0x1>,
> > > > + <0x1000 0x0 0x0 0x1 &aplic_smode 0x6 0x1>,
> > > > + <0x1800 0x0 0x0 0x1 &aplic_smode 0x7 0x1>;
> > > > +
> > > > + /* PCI_DEVICE(3) INT#(1) */
> > > > + interrupt-map-mask = <0xf800 0x0 0x0 0x7>;
> > > > +
> > > > + msi-parent = <&imsics_smode>;
> > > > +
> > > > + /* Devices with bus number 0-127 are mastered via immu2 */
> > > > + iommu-map = <0x0000 &immu2 0x0000 0x8000>;
> > > > + };
> > > > + };
> > > > +...
> > > > --
> > > > 2.34.1
> > > >
> > > >
> > > > _______________________________________________
> > > > linux-riscv mailing list
> > > > [email protected]
> > > > http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-07-24 14:38:55

by Zong Li

[permalink] [raw]
Subject: Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings

On Mon, Jul 24, 2023 at 8:10 PM Anup Patel <[email protected]> wrote:
>
> On Mon, Jul 24, 2023 at 5:01 PM Zong Li <[email protected]> wrote:
> >
> > On Mon, Jul 24, 2023 at 6:02 PM Anup Patel <[email protected]> wrote:
> > >
> > > On Mon, Jul 24, 2023 at 1:33 PM Zong Li <[email protected]> wrote:
> > > >
> > > > On Thu, Jul 20, 2023 at 3:35 AM Tomasz Jeznach <[email protected]> wrote:
> > > > >
> > > > > From: Anup Patel <[email protected]>
> > > > >
> > > > > We add DT bindings document for RISC-V IOMMU platform and PCI devices
> > > > > defined by the RISC-V IOMMU specification.
> > > > >
> > > > > Signed-off-by: Anup Patel <[email protected]>
> > > > > ---
> > > > > .../bindings/iommu/riscv,iommu.yaml | 146 ++++++++++++++++++
> > > > > 1 file changed, 146 insertions(+)
> > > > > create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > > >
> > > > > diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > > > new file mode 100644
> > > > > index 000000000000..8a9aedb61768
> > > > > --- /dev/null
> > > > > +++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > > > @@ -0,0 +1,146 @@
> > > > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > > > +%YAML 1.2
> > > > > +---
> > > > > +$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
> > > > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > > > +
> > > > > +title: RISC-V IOMMU Implementation
> > > > > +
> > > > > +maintainers:
> > > > > + - Tomasz Jeznach <[email protected]>
> > > > > +
> > > > > +description:
> > > > > + The RISC-V IOMMU specificaiton defines an IOMMU for RISC-V platforms
> > > > > + which can be a regular platform device or a PCI device connected to
> > > > > + the host root port.
> > > > > +
> > > > > + The RISC-V IOMMU provides two stage translation, device directory table,
> > > > > + command queue and fault reporting as wired interrupt or MSIx event for
> > > > > + both PCI and platform devices.
> > > > > +
> > > > > + Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
> > > > > +
> > > > > +properties:
> > > > > + compatible:
> > > > > + oneOf:
> > > > > + - description: RISC-V IOMMU as a platform device
> > > > > + items:
> > > > > + - enum:
> > > > > + - vendor,chip-iommu
> > > > > + - const: riscv,iommu
> > > > > +
> > > > > + - description: RISC-V IOMMU as a PCI device connected to root port
> > > > > + items:
> > > > > + - enum:
> > > > > + - vendor,chip-pci-iommu
> > > > > + - const: riscv,pci-iommu
> > > > > +
> > > > > + reg:
> > > > > + maxItems: 1
> > > > > + description:
> > > > > + For RISC-V IOMMU as a platform device, this represents the MMIO base
> > > > > + address of registers.
> > > > > +
> > > > > + For RISC-V IOMMU as a PCI device, this represents the PCI-PCI bridge
> > > > > + details as described in Documentation/devicetree/bindings/pci/pci.txt
> > > > > +
> > > > > + '#iommu-cells':
> > > > > + const: 2
> > > > > + description: |
> > > > > + Each IOMMU specifier represents the base device ID and number of
> > > > > + device IDs.
> > > > > +
> > > > > + interrupts:
> > > > > + minItems: 1
> > > > > + maxItems: 16
> > > > > + description:
> > > > > + The presence of this property implies that given RISC-V IOMMU uses
> > > > > + wired interrupts to notify the RISC-V HARTS (or CPUs).
> > > > > +
> > > > > + msi-parent:
> > > > > + description:
> > > > > + The presence of this property implies that given RISC-V IOMMU uses
> > > > > + MSIx to notify the RISC-V HARTs (or CPUs). This property should be
> > > > > + considered only when the interrupts property is absent.
> > > > > +
> > > > > + dma-coherent:
> > > > > + description:
> > > > > + Present if page table walks and DMA accessed made by the RISC-V IOMMU
> > > > > + are cache coherent with the CPU.
> > > > > +
> > > > > + power-domains:
> > > > > + maxItems: 1
> > > > > +
> > > >
> > > > In RISC-V IOMMU, certain devices can be set to bypass mode when the
> > > > IOMMU is in translation mode. To identify the devices that require
> > > > bypass mode by default, does it be sensible to add a property to
> > > > indicate this behavior?
> > >
> > > Bypass mode for a device is a property of that device (similar to dma-coherent)
> > > and not of the IOMMU. Other architectures (ARM and x86) never added such
> > > a device property for bypass mode so I guess it is NOT ADVISABLE to do it.
> > >
> > > If this is REALLY required then we can do something similar to the QCOM
> > > SMMU driver where they have a whitelist of devices which are allowed to
> > > be in bypass mode (i.e. IOMMU_DOMAIN_IDENTITY) based their device
> > > compatible string and any device outside this whitelist is blocked by default.
> > >
> >
> > I have considered that adding the property of bypass mode to that
> > device would be more appropriate. However, if we want to define this
> > property for the device, it might need to go through the generic IOMMU
> > dt-bindings, but I'm not sure if other IOMMU devices need this. I am
> > bringing up this topic here because I would like to explore if there
> > are any solutions on the IOMMU side, such as a property that indicates
> > the phandle of devices wishing to set bypass mode, somewhat similar to
> > the whitelist you mentioned earlier. Do you think we should address
> > this? After all, this is a case of RISC-V IOMMU supported.
>
> Bypass mode is a common feature across IOMMUs. Other IOMMUs don't
> have a special property for bypass mode at device-level or at IOMMU level,
> which clearly indicates that defining a RISC-V specific property is not the
> right way to go.
>
> The real question is how do we set IOMMU_DOMAIN_IDENTITY (i.e.
> bypass/identity domain) as the default domain for certain devices ?
>
> One possible option is to implement def_domain_type() IOMMU operation
> for RISC-V IOMMU which will return IOMMU_DOMAIN_IDENTITY for
> certain devices based on compatible string matching (i.e. whitelist of
> devices). As an example, refer qcom_smmu_def_domain_type()
> of drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>

That is indeed one way to approach it, and we can modify the
compatible string when we want to change the mode. However, it would
be preferable to explore a more flexible approach to achieve this
goal. By doing so, we can avoid hard coding anything in the driver or
having to rebuild the kernel whenever we want to change the mode for
certain devices. While I have considered extending a cell in the
'iommus' property to indicate a device's desire to set bypass mode, it
doesn't comply with the iommu documentation and could lead to
ambiguous definitions.

If, at present, we are unable to find a suitable solution, perhaps
let's keep this topic in mind until we discover a more appropriate
approach. In the meantime, we can continue to explore other
possibilities to implement it. Thanks.

> Regards,
> Anup
>
>
>
>
>
> >
> > > Regards,
> > > Anup
> > >
> > > >
> > > > > +required:
> > > > > + - compatible
> > > > > + - reg
> > > > > + - '#iommu-cells'
> > > > > +
> > > > > +additionalProperties: false
> > > > > +
> > > > > +examples:
> > > > > + - |
> > > > > + /* Example 1 (IOMMU platform device with wired interrupts) */
> > > > > + immu1: iommu@1bccd000 {
> > > > > + compatible = "vendor,chip-iommu", "riscv,iommu";
> > > > > + reg = <0x1bccd000 0x1000>;
> > > > > + interrupt-parent = <&aplic_smode>;
> > > > > + interrupts = <32 4>, <33 4>, <34 4>, <35 4>;
> > > > > + #iommu-cells = <2>;
> > > > > + };
> > > > > +
> > > > > + /* Device with two IOMMU device IDs, 0 and 7 */
> > > > > + master1 {
> > > > > + iommus = <&immu1 0 1>, <&immu1 7 1>;
> > > > > + };
> > > > > +
> > > > > + - |
> > > > > + /* Example 2 (IOMMU platform device with MSIs) */
> > > > > + immu2: iommu@1bcdd000 {
> > > > > + compatible = "vendor,chip-iommu", "riscv,iommu";
> > > > > + reg = <0x1bccd000 0x1000>;
> > > > > + msi-parent = <&imsics_smode>;
> > > > > + #iommu-cells = <2>;
> > > > > + };
> > > > > +
> > > > > + bus {
> > > > > + #address-cells = <2>;
> > > > > + #size-cells = <2>;
> > > > > +
> > > > > + /* Device with IOMMU device IDs ranging from 32 to 64 */
> > > > > + master1 {
> > > > > + iommus = <&immu2 32 32>;
> > > > > + };
> > > > > +
> > > > > + pcie@40000000 {
> > > > > + compatible = "pci-host-cam-generic";
> > > > > + device_type = "pci";
> > > > > + #address-cells = <3>;
> > > > > + #size-cells = <2>;
> > > > > + bus-range = <0x0 0x1>;
> > > > > +
> > > > > + /* CPU_PHYSICAL(2) SIZE(2) */
> > > > > + reg = <0x0 0x40000000 0x0 0x1000000>;
> > > > > +
> > > > > + /* BUS_ADDRESS(3) CPU_PHYSICAL(2) SIZE(2) */
> > > > > + ranges = <0x01000000 0x0 0x01000000 0x0 0x01000000 0x0 0x00010000>,
> > > > > + <0x02000000 0x0 0x41000000 0x0 0x41000000 0x0 0x3f000000>;
> > > > > +
> > > > > + #interrupt-cells = <0x1>;
> > > > > +
> > > > > + /* PCI_DEVICE(3) INT#(1) CONTROLLER(PHANDLE) CONTROLLER_DATA(2) */
> > > > > + interrupt-map = < 0x0 0x0 0x0 0x1 &aplic_smode 0x4 0x1>,
> > > > > + < 0x800 0x0 0x0 0x1 &aplic_smode 0x5 0x1>,
> > > > > + <0x1000 0x0 0x0 0x1 &aplic_smode 0x6 0x1>,
> > > > > + <0x1800 0x0 0x0 0x1 &aplic_smode 0x7 0x1>;
> > > > > +
> > > > > + /* PCI_DEVICE(3) INT#(1) */
> > > > > + interrupt-map-mask = <0xf800 0x0 0x0 0x7>;
> > > > > +
> > > > > + msi-parent = <&imsics_smode>;
> > > > > +
> > > > > + /* Devices with bus number 0-127 are mastered via immu2 */
> > > > > + iommu-map = <0x0000 &immu2 0x0000 0x8000>;
> > > > > + };
> > > > > + };
> > > > > +...
> > > > > --
> > > > > 2.34.1
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > linux-riscv mailing list
> > > > > [email protected]
> > > > > http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-07-26 03:55:38

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings

On 2023/7/24 21:23, Zong Li wrote:
>>>>> In RISC-V IOMMU, certain devices can be set to bypass mode when the
>>>>> IOMMU is in translation mode. To identify the devices that require
>>>>> bypass mode by default, does it be sensible to add a property to
>>>>> indicate this behavior?
>>>> Bypass mode for a device is a property of that device (similar to dma-coherent)
>>>> and not of the IOMMU. Other architectures (ARM and x86) never added such
>>>> a device property for bypass mode so I guess it is NOT ADVISABLE to do it.
>>>>
>>>> If this is REALLY required then we can do something similar to the QCOM
>>>> SMMU driver where they have a whitelist of devices which are allowed to
>>>> be in bypass mode (i.e. IOMMU_DOMAIN_IDENTITY) based their device
>>>> compatible string and any device outside this whitelist is blocked by default.
>>>>
>>> I have considered that adding the property of bypass mode to that
>>> device would be more appropriate. However, if we want to define this
>>> property for the device, it might need to go through the generic IOMMU
>>> dt-bindings, but I'm not sure if other IOMMU devices need this. I am
>>> bringing up this topic here because I would like to explore if there
>>> are any solutions on the IOMMU side, such as a property that indicates
>>> the phandle of devices wishing to set bypass mode, somewhat similar to
>>> the whitelist you mentioned earlier. Do you think we should address
>>> this? After all, this is a case of RISC-V IOMMU supported.
>> Bypass mode is a common feature across IOMMUs. Other IOMMUs don't
>> have a special property for bypass mode at device-level or at IOMMU level,
>> which clearly indicates that defining a RISC-V specific property is not the
>> right way to go.
>>
>> The real question is how do we set IOMMU_DOMAIN_IDENTITY (i.e.
>> bypass/identity domain) as the default domain for certain devices ?
>>
>> One possible option is to implement def_domain_type() IOMMU operation
>> for RISC-V IOMMU which will return IOMMU_DOMAIN_IDENTITY for
>> certain devices based on compatible string matching (i.e. whitelist of
>> devices). As an example, refer qcom_smmu_def_domain_type()
>> of drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>
> That is indeed one way to approach it, and we can modify the
> compatible string when we want to change the mode. However, it would
> be preferable to explore a more flexible approach to achieve this
> goal. By doing so, we can avoid hard coding anything in the driver or
> having to rebuild the kernel whenever we want to change the mode for
> certain devices. While I have considered extending a cell in the
> 'iommus' property to indicate a device's desire to set bypass mode, it
> doesn't comply with the iommu documentation and could lead to
> ambiguous definitions.

Hard coding the matching strings in the iommu driver is definitely not a
preferable way. A feasible solution from current code's point of view is
that platform opt-in the device's special requirements through DT or
ACPI. And in the def_domain_type callback, let the iommu core know that,
hence it can allocate a right type of domain for the device.

Thoughts?

Best regards,
baolu

2023-07-26 05:09:09

by Zong Li

[permalink] [raw]
Subject: Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings

On Wed, Jul 26, 2023 at 11:21 AM Baolu Lu <[email protected]> wrote:
>
> On 2023/7/24 21:23, Zong Li wrote:
> >>>>> In RISC-V IOMMU, certain devices can be set to bypass mode when the
> >>>>> IOMMU is in translation mode. To identify the devices that require
> >>>>> bypass mode by default, does it be sensible to add a property to
> >>>>> indicate this behavior?
> >>>> Bypass mode for a device is a property of that device (similar to dma-coherent)
> >>>> and not of the IOMMU. Other architectures (ARM and x86) never added such
> >>>> a device property for bypass mode so I guess it is NOT ADVISABLE to do it.
> >>>>
> >>>> If this is REALLY required then we can do something similar to the QCOM
> >>>> SMMU driver where they have a whitelist of devices which are allowed to
> >>>> be in bypass mode (i.e. IOMMU_DOMAIN_IDENTITY) based their device
> >>>> compatible string and any device outside this whitelist is blocked by default.
> >>>>
> >>> I have considered that adding the property of bypass mode to that
> >>> device would be more appropriate. However, if we want to define this
> >>> property for the device, it might need to go through the generic IOMMU
> >>> dt-bindings, but I'm not sure if other IOMMU devices need this. I am
> >>> bringing up this topic here because I would like to explore if there
> >>> are any solutions on the IOMMU side, such as a property that indicates
> >>> the phandle of devices wishing to set bypass mode, somewhat similar to
> >>> the whitelist you mentioned earlier. Do you think we should address
> >>> this? After all, this is a case of RISC-V IOMMU supported.
> >> Bypass mode is a common feature across IOMMUs. Other IOMMUs don't
> >> have a special property for bypass mode at device-level or at IOMMU level,
> >> which clearly indicates that defining a RISC-V specific property is not the
> >> right way to go.
> >>
> >> The real question is how do we set IOMMU_DOMAIN_IDENTITY (i.e.
> >> bypass/identity domain) as the default domain for certain devices ?
> >>
> >> One possible option is to implement def_domain_type() IOMMU operation
> >> for RISC-V IOMMU which will return IOMMU_DOMAIN_IDENTITY for
> >> certain devices based on compatible string matching (i.e. whitelist of
> >> devices). As an example, refer qcom_smmu_def_domain_type()
> >> of drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> >>
> > That is indeed one way to approach it, and we can modify the
> > compatible string when we want to change the mode. However, it would
> > be preferable to explore a more flexible approach to achieve this
> > goal. By doing so, we can avoid hard coding anything in the driver or
> > having to rebuild the kernel whenever we want to change the mode for
> > certain devices. While I have considered extending a cell in the
> > 'iommus' property to indicate a device's desire to set bypass mode, it
> > doesn't comply with the iommu documentation and could lead to
> > ambiguous definitions.
>
> Hard coding the matching strings in the iommu driver is definitely not a
> preferable way. A feasible solution from current code's point of view is
> that platform opt-in the device's special requirements through DT or
> ACPI. And in the def_domain_type callback, let the iommu core know that,
> hence it can allocate a right type of domain for the device.
>
> Thoughts?
>

It would be nice if we can deal with it at this time. As we discussed
earlier, we might need to consider how to indicate that, such as
putting a property in device side or iommu side, and whether we need
to define it in generic dt-binding instead of RISC-V specific
dt-binding.

> Best regards,
> baolu

2023-07-26 13:28:02

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings

On Wed, Jul 26, 2023 at 12:26:14PM +0800, Zong Li wrote:
> On Wed, Jul 26, 2023 at 11:21 AM Baolu Lu <[email protected]> wrote:
> >
> > On 2023/7/24 21:23, Zong Li wrote:
> > >>>>> In RISC-V IOMMU, certain devices can be set to bypass mode when the
> > >>>>> IOMMU is in translation mode. To identify the devices that require
> > >>>>> bypass mode by default, does it be sensible to add a property to
> > >>>>> indicate this behavior?
> > >>>> Bypass mode for a device is a property of that device (similar to dma-coherent)
> > >>>> and not of the IOMMU. Other architectures (ARM and x86) never added such
> > >>>> a device property for bypass mode so I guess it is NOT ADVISABLE to do it.
> > >>>>
> > >>>> If this is REALLY required then we can do something similar to the QCOM
> > >>>> SMMU driver where they have a whitelist of devices which are allowed to
> > >>>> be in bypass mode (i.e. IOMMU_DOMAIN_IDENTITY) based their device
> > >>>> compatible string and any device outside this whitelist is
> > >>>> blocked by default.

I have a draft patch someplace that consolidated all this quirk
checking into the core code. Generally the expectation is that any
device behind an iommu is fully functional in all modes. The existing
quirks are for HW defects that make some devices not work properly. In
this case the right outcome seems to be effectively blocking them from
using the iommu.

So, you should explain a lot more what "require bypass mode" means in
the RISCV world and why any device would need it.

Jason

2023-07-27 02:51:59

by Zong Li

[permalink] [raw]
Subject: Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings

On Wed, Jul 26, 2023 at 8:17 PM Jason Gunthorpe <[email protected]> wrote:
>
> On Wed, Jul 26, 2023 at 12:26:14PM +0800, Zong Li wrote:
> > On Wed, Jul 26, 2023 at 11:21 AM Baolu Lu <[email protected]> wrote:
> > >
> > > On 2023/7/24 21:23, Zong Li wrote:
> > > >>>>> In RISC-V IOMMU, certain devices can be set to bypass mode when the
> > > >>>>> IOMMU is in translation mode. To identify the devices that require
> > > >>>>> bypass mode by default, does it be sensible to add a property to
> > > >>>>> indicate this behavior?
> > > >>>> Bypass mode for a device is a property of that device (similar to dma-coherent)
> > > >>>> and not of the IOMMU. Other architectures (ARM and x86) never added such
> > > >>>> a device property for bypass mode so I guess it is NOT ADVISABLE to do it.
> > > >>>>
> > > >>>> If this is REALLY required then we can do something similar to the QCOM
> > > >>>> SMMU driver where they have a whitelist of devices which are allowed to
> > > >>>> be in bypass mode (i.e. IOMMU_DOMAIN_IDENTITY) based their device
> > > >>>> compatible string and any device outside this whitelist is
> > > >>>> blocked by default.
>
> I have a draft patch someplace that consolidated all this quirk
> checking into the core code. Generally the expectation is that any
> device behind an iommu is fully functional in all modes. The existing
> quirks are for HW defects that make some devices not work properly. In
> this case the right outcome seems to be effectively blocking them from
> using the iommu.
>
> So, you should explain a lot more what "require bypass mode" means in
> the RISCV world and why any device would need it.

Perhaps this question could be related to the scenarios in which
devices wish to be in bypass mode when the IOMMU is in translation
mode, and why IOMMU defines/supports this case. Currently, I could
envision a scenario where a device is already connected to the IOMMU
in hardware, but it is not functioning correctly, or there are
performance impacts. If modifying the hardware is not feasible, a
default configuration that allows bypass mode could be provided as a
solution. There might be other scenarios that I might have overlooked.
It seems to me since IOMMU supports this configuration, it would be
advantageous to have an approach to achieve it, and DT might be a
flexible way.

>
> Jason

2023-07-28 06:06:35

by Tomasz Jeznach

[permalink] [raw]
Subject: Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues

On Mon, Jul 24, 2023 at 11:47 AM Zong Li <[email protected]> wrote:
>
> On Fri, Jul 21, 2023 at 2:00 AM Tomasz Jeznach <[email protected]> wrote:
> >
> > On Wed, Jul 19, 2023 at 8:12 PM Nick Kossifidis <[email protected]> wrote:
> > >
> > > Hello Tomasz,
> > >
> > > On 7/19/23 22:33, Tomasz Jeznach wrote:
> > > > Enables message or wire signal interrupts for PCIe and platforms devices.
> > > >
> > >
> > > The description doesn't match the subject nor the patch content (we
> > > don't jus enable interrupts, we also init the queues).
> > >
> > > > + /* Parse Queue lengts */
> > > > + ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> > > > + if (!ret)
> > > > + dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> > > > +
> > > > + ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> > > > + if (!ret)
> > > > + dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> > > > +
> > > > + ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> > > > + if (!ret)
> > > > + dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
> > > > +
> > > > dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> > > >
> > >
> > > We need to add those to the device tree binding doc (or throw them away,
> > > I thought it would be better to have them as part of the device
> > > desciption than a module parameter).
> > >
> >
> > We can add them as an optional fields to DT.
> > Alternatively, I've been looking into an option to auto-scale CQ/PQ
> > based on number of attached devices, but this gets trickier for
> > hot-pluggable systems. I've added module parameters as a bare-minimum,
> > but still looking for better solutions.
> >
> > >
> > > > +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data);
> > > > +static irqreturn_t riscv_iommu_priq_process(int irq, void *data);
> > > > +
> > >
> > > > + case RISCV_IOMMU_PAGE_REQUEST_QUEUE:
> > > > + q = &iommu->priq;
> > > > + q->len = sizeof(struct riscv_iommu_pq_record);
> > > > + count = iommu->priq_len;
> > > > + irq = iommu->irq_priq;
> > > > + irq_check = riscv_iommu_priq_irq_check;
> > > > + irq_process = riscv_iommu_priq_process;
> > > > + q->qbr = RISCV_IOMMU_REG_PQB;
> > > > + q->qcr = RISCV_IOMMU_REG_PQCSR;
> > > > + name = "priq";
> > > > + break;
> > >
> > >
> > > It makes more sense to add the code for the page request queue in the
> > > patch that adds ATS/PRI support IMHO. This comment also applies to its
> > > interrupt handlers below.
> > >
> >
> > ack. will do.
> >
> > >
> > > > +static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
> > > > + u64 addr)
> > > > +{
> > > > + cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
> > > > + cmd->dword1 = addr;
> > > > +}
> > > > +
> > >
> > > This needs to be (addr >> 2) to match the spec, same as in the iofence
> > > command.
> > >
> >
> > oops. Thanks!
> >
>
> I think it should be (addr >> 12) according to the spec.
>

My reading of the spec '3.1.1. IOMMU Page-Table cache invalidation commands'
is that it is a 4k page aligned address packed at dword1[61:10], so
effectively shifted by 2 bits.

regards,
- Tomasz

> > > Regards,
> > > Nick
> > >
> >
> > regards,
> > - Tomasz
> >
> > _______________________________________________
> > linux-riscv mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-07-28 09:10:23

by Zong Li

[permalink] [raw]
Subject: Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues

On Fri, Jul 28, 2023 at 1:19 PM Tomasz Jeznach <[email protected]> wrote:
>
> On Mon, Jul 24, 2023 at 11:47 AM Zong Li <[email protected]> wrote:
> >
> > On Fri, Jul 21, 2023 at 2:00 AM Tomasz Jeznach <[email protected]> wrote:
> > >
> > > On Wed, Jul 19, 2023 at 8:12 PM Nick Kossifidis <[email protected]> wrote:
> > > >
> > > > Hello Tomasz,
> > > >
> > > > On 7/19/23 22:33, Tomasz Jeznach wrote:
> > > > > Enables message or wire signal interrupts for PCIe and platforms devices.
> > > > >
> > > >
> > > > The description doesn't match the subject nor the patch content (we
> > > > don't jus enable interrupts, we also init the queues).
> > > >
> > > > > + /* Parse Queue lengts */
> > > > > + ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> > > > > + if (!ret)
> > > > > + dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> > > > > +
> > > > > + ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> > > > > + if (!ret)
> > > > > + dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> > > > > +
> > > > > + ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> > > > > + if (!ret)
> > > > > + dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
> > > > > +
> > > > > dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> > > > >
> > > >
> > > > We need to add those to the device tree binding doc (or throw them away,
> > > > I thought it would be better to have them as part of the device
> > > > desciption than a module parameter).
> > > >
> > >
> > > We can add them as an optional fields to DT.
> > > Alternatively, I've been looking into an option to auto-scale CQ/PQ
> > > based on number of attached devices, but this gets trickier for
> > > hot-pluggable systems. I've added module parameters as a bare-minimum,
> > > but still looking for better solutions.
> > >
> > > >
> > > > > +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data);
> > > > > +static irqreturn_t riscv_iommu_priq_process(int irq, void *data);
> > > > > +
> > > >
> > > > > + case RISCV_IOMMU_PAGE_REQUEST_QUEUE:
> > > > > + q = &iommu->priq;
> > > > > + q->len = sizeof(struct riscv_iommu_pq_record);
> > > > > + count = iommu->priq_len;
> > > > > + irq = iommu->irq_priq;
> > > > > + irq_check = riscv_iommu_priq_irq_check;
> > > > > + irq_process = riscv_iommu_priq_process;
> > > > > + q->qbr = RISCV_IOMMU_REG_PQB;
> > > > > + q->qcr = RISCV_IOMMU_REG_PQCSR;
> > > > > + name = "priq";
> > > > > + break;
> > > >
> > > >
> > > > It makes more sense to add the code for the page request queue in the
> > > > patch that adds ATS/PRI support IMHO. This comment also applies to its
> > > > interrupt handlers below.
> > > >
> > >
> > > ack. will do.
> > >
> > > >
> > > > > +static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
> > > > > + u64 addr)
> > > > > +{
> > > > > + cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
> > > > > + cmd->dword1 = addr;
> > > > > +}
> > > > > +
> > > >
> > > > This needs to be (addr >> 2) to match the spec, same as in the iofence
> > > > command.
> > > >
> > >
> > > oops. Thanks!
> > >
> >
> > I think it should be (addr >> 12) according to the spec.
> >
>
> My reading of the spec '3.1.1. IOMMU Page-Table cache invalidation commands'
> is that it is a 4k page aligned address packed at dword1[61:10], so
> effectively shifted by 2 bits.

Thanks for your clarifying. Just an opinion, perhaps you can use
'FIELD_PREP()' on it as well, it might be clearer.

>
> regards,
> - Tomasz
>
> > > > Regards,
> > > > Nick
> > > >
> > >
> > > regards,
> > > - Tomasz
> > >
> > > _______________________________________________
> > > linux-riscv mailing list
> > > [email protected]
> > > http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-07-29 13:44:49

by Zong Li

[permalink] [raw]
Subject: Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues

On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <[email protected]> wrote:
>
> Enables message or wire signal interrupts for PCIe and platforms devices.
>
> Co-developed-by: Nick Kossifidis <[email protected]>
> Signed-off-by: Nick Kossifidis <[email protected]>
> Signed-off-by: Tomasz Jeznach <[email protected]>
> ---
> drivers/iommu/riscv/iommu-pci.c | 72 ++++
> drivers/iommu/riscv/iommu-platform.c | 66 +++
> drivers/iommu/riscv/iommu.c | 604 ++++++++++++++++++++++++++-
> drivers/iommu/riscv/iommu.h | 28 ++
> 4 files changed, 769 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/riscv/iommu-pci.c b/drivers/iommu/riscv/iommu-pci.c
> index c91f963d7a29..9ea0647f7b92 100644
> --- a/drivers/iommu/riscv/iommu-pci.c
> +++ b/drivers/iommu/riscv/iommu-pci.c
> @@ -34,6 +34,7 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
> {
> struct device *dev = &pdev->dev;
> struct riscv_iommu_device *iommu;
> + u64 icvec;
> int ret;
>
> ret = pci_enable_device_mem(pdev);
> @@ -67,14 +68,84 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
> iommu->dev = dev;
> dev_set_drvdata(dev, iommu);
>
> + /* Check device reported capabilities. */
> + iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
> +
> + /* The PCI driver only uses MSIs, make sure the IOMMU supports this */
> + switch (FIELD_GET(RISCV_IOMMU_CAP_IGS, iommu->cap)) {
> + case RISCV_IOMMU_CAP_IGS_MSI:
> + case RISCV_IOMMU_CAP_IGS_BOTH:
> + break;
> + default:
> + dev_err(dev, "unable to use message-signaled interrupts\n");
> + ret = -ENODEV;
> + goto fail;
> + }
> +
> dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
> pci_set_master(pdev);
>
> + /* Allocate and assign IRQ vectors for the various events */
> + ret = pci_alloc_irq_vectors(pdev, 1, RISCV_IOMMU_INTR_COUNT, PCI_IRQ_MSIX);
> + if (ret < 0) {
> + dev_err(dev, "unable to allocate irq vectors\n");
> + goto fail;
> + }
> +
> + ret = -ENODEV;
> +
> + iommu->irq_cmdq = msi_get_virq(dev, RISCV_IOMMU_INTR_CQ);
> + if (!iommu->irq_cmdq) {
> + dev_warn(dev, "no MSI vector %d for the command queue\n",
> + RISCV_IOMMU_INTR_CQ);
> + goto fail;
> + }
> +
> + iommu->irq_fltq = msi_get_virq(dev, RISCV_IOMMU_INTR_FQ);
> + if (!iommu->irq_fltq) {
> + dev_warn(dev, "no MSI vector %d for the fault/event queue\n",
> + RISCV_IOMMU_INTR_FQ);
> + goto fail;
> + }
> +
> + if (iommu->cap & RISCV_IOMMU_CAP_HPM) {
> + iommu->irq_pm = msi_get_virq(dev, RISCV_IOMMU_INTR_PM);
> + if (!iommu->irq_pm) {
> + dev_warn(dev,
> + "no MSI vector %d for performance monitoring\n",
> + RISCV_IOMMU_INTR_PM);
> + goto fail;
> + }
> + }
> +
> + if (iommu->cap & RISCV_IOMMU_CAP_ATS) {
> + iommu->irq_priq = msi_get_virq(dev, RISCV_IOMMU_INTR_PQ);
> + if (!iommu->irq_priq) {
> + dev_warn(dev,
> + "no MSI vector %d for page-request queue\n",
> + RISCV_IOMMU_INTR_PQ);
> + goto fail;
> + }
> + }
> +
> + /* Set simple 1:1 mapping for MSI vectors */
> + icvec = FIELD_PREP(RISCV_IOMMU_IVEC_CIV, RISCV_IOMMU_INTR_CQ) |
> + FIELD_PREP(RISCV_IOMMU_IVEC_FIV, RISCV_IOMMU_INTR_FQ);
> +
> + if (iommu->cap & RISCV_IOMMU_CAP_HPM)
> + icvec |= FIELD_PREP(RISCV_IOMMU_IVEC_PMIV, RISCV_IOMMU_INTR_PM);
> +
> + if (iommu->cap & RISCV_IOMMU_CAP_ATS)
> + icvec |= FIELD_PREP(RISCV_IOMMU_IVEC_PIV, RISCV_IOMMU_INTR_PQ);
> +
> + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IVEC, icvec);
> +
> ret = riscv_iommu_init(iommu);
> if (!ret)
> return ret;
>
> fail:
> + pci_free_irq_vectors(pdev);
> pci_clear_master(pdev);
> pci_release_regions(pdev);
> pci_disable_device(pdev);
> @@ -85,6 +156,7 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
> static void riscv_iommu_pci_remove(struct pci_dev *pdev)
> {
> riscv_iommu_remove(dev_get_drvdata(&pdev->dev));
> + pci_free_irq_vectors(pdev);
> pci_clear_master(pdev);
> pci_release_regions(pdev);
> pci_disable_device(pdev);
> diff --git a/drivers/iommu/riscv/iommu-platform.c b/drivers/iommu/riscv/iommu-platform.c
> index e4e8ca6711e7..35935d3c7ef4 100644
> --- a/drivers/iommu/riscv/iommu-platform.c
> +++ b/drivers/iommu/riscv/iommu-platform.c
> @@ -20,6 +20,8 @@ static int riscv_iommu_platform_probe(struct platform_device *pdev)
> struct device *dev = &pdev->dev;
> struct riscv_iommu_device *iommu = NULL;
> struct resource *res = NULL;
> + u32 fctl = 0;
> + int irq = 0;
> int ret = 0;
>
> iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
> @@ -53,6 +55,70 @@ static int riscv_iommu_platform_probe(struct platform_device *pdev)
> goto fail;
> }
>
> + iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
> +
> + /* For now we only support WSIs until we have AIA support */

I'm not completely understand AIA support here, because I saw the pci
case uses the MSI, and kernel seems to have the AIA implementation.
Could you please elaborate it?

> + ret = FIELD_GET(RISCV_IOMMU_CAP_IGS, iommu->cap);
> + if (ret == RISCV_IOMMU_CAP_IGS_MSI) {
> + dev_err(dev, "IOMMU only supports MSIs\n");
> + goto fail;
> + }
> +
> + /* Parse IRQ assignment */
> + irq = platform_get_irq_byname_optional(pdev, "cmdq");
> + if (irq > 0)
> + iommu->irq_cmdq = irq;
> + else {
> + dev_err(dev, "no IRQ provided for the command queue\n");
> + goto fail;
> + }
> +
> + irq = platform_get_irq_byname_optional(pdev, "fltq");
> + if (irq > 0)
> + iommu->irq_fltq = irq;
> + else {
> + dev_err(dev, "no IRQ provided for the fault/event queue\n");
> + goto fail;
> + }
> +
> + if (iommu->cap & RISCV_IOMMU_CAP_HPM) {
> + irq = platform_get_irq_byname_optional(pdev, "pm");
> + if (irq > 0)
> + iommu->irq_pm = irq;
> + else {
> + dev_err(dev, "no IRQ provided for performance monitoring\n");
> + goto fail;
> + }
> + }
> +
> + if (iommu->cap & RISCV_IOMMU_CAP_ATS) {
> + irq = platform_get_irq_byname_optional(pdev, "priq");
> + if (irq > 0)
> + iommu->irq_priq = irq;
> + else {
> + dev_err(dev, "no IRQ provided for the page-request queue\n");
> + goto fail;
> + }
> + }

Should we define the "interrupt-names" in dt-bindings?

> +
> + /* Make sure fctl.WSI is set */
> + fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL);
> + fctl |= RISCV_IOMMU_FCTL_WSI;
> + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, fctl);
> +
> + /* Parse Queue lengts */
> + ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> + if (!ret)
> + dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> +
> + ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> + if (!ret)
> + dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> +
> + ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> + if (!ret)
> + dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
> +
> dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
>
> return riscv_iommu_init(iommu);
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 31dc3c458e13..5c4cf9875302 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -45,6 +45,18 @@ static int ddt_mode = RISCV_IOMMU_DDTP_MODE_BARE;
> module_param(ddt_mode, int, 0644);
> MODULE_PARM_DESC(ddt_mode, "Device Directory Table mode.");
>
> +static int cmdq_length = 1024;
> +module_param(cmdq_length, int, 0644);
> +MODULE_PARM_DESC(cmdq_length, "Command queue length.");
> +
> +static int fltq_length = 1024;
> +module_param(fltq_length, int, 0644);
> +MODULE_PARM_DESC(fltq_length, "Fault queue length.");
> +
> +static int priq_length = 1024;
> +module_param(priq_length, int, 0644);
> +MODULE_PARM_DESC(priq_length, "Page request interface queue length.");
> +
> /* IOMMU PSCID allocation namespace. */
> #define RISCV_IOMMU_MAX_PSCID (1U << 20)
> static DEFINE_IDA(riscv_iommu_pscids);
> @@ -65,6 +77,497 @@ static DEFINE_IDA(riscv_iommu_pscids);
> static const struct iommu_domain_ops riscv_iommu_domain_ops;
> static const struct iommu_ops riscv_iommu_ops;
>
> +/*
> + * Common queue management routines
> + */
> +
> +/* Note: offsets are the same for all queues */
> +#define Q_HEAD(q) ((q)->qbr + (RISCV_IOMMU_REG_CQH - RISCV_IOMMU_REG_CQB))
> +#define Q_TAIL(q) ((q)->qbr + (RISCV_IOMMU_REG_CQT - RISCV_IOMMU_REG_CQB))
> +
> +static unsigned riscv_iommu_queue_consume(struct riscv_iommu_device *iommu,
> + struct riscv_iommu_queue *q, unsigned *ready)
> +{
> + u32 tail = riscv_iommu_readl(iommu, Q_TAIL(q));
> + *ready = q->lui;
> +
> + BUG_ON(q->cnt <= tail);
> + if (q->lui <= tail)
> + return tail - q->lui;
> + return q->cnt - q->lui;
> +}
> +
> +static void riscv_iommu_queue_release(struct riscv_iommu_device *iommu,
> + struct riscv_iommu_queue *q, unsigned count)
> +{
> + q->lui = (q->lui + count) & (q->cnt - 1);
> + riscv_iommu_writel(iommu, Q_HEAD(q), q->lui);
> +}
> +
> +static u32 riscv_iommu_queue_ctrl(struct riscv_iommu_device *iommu,
> + struct riscv_iommu_queue *q, u32 val)
> +{
> + cycles_t end_cycles = RISCV_IOMMU_TIMEOUT + get_cycles();
> +
> + riscv_iommu_writel(iommu, q->qcr, val);
> + do {
> + val = riscv_iommu_readl(iommu, q->qcr);
> + if (!(val & RISCV_IOMMU_QUEUE_BUSY))
> + break;
> + cpu_relax();
> + } while (get_cycles() < end_cycles);
> +
> + return val;
> +}
> +
> +static void riscv_iommu_queue_free(struct riscv_iommu_device *iommu,
> + struct riscv_iommu_queue *q)
> +{
> + size_t size = q->len * q->cnt;
> +
> + riscv_iommu_queue_ctrl(iommu, q, 0);
> +
> + if (q->base) {
> + if (q->in_iomem)
> + iounmap(q->base);
> + else
> + dmam_free_coherent(iommu->dev, size, q->base, q->base_dma);
> + }
> + if (q->irq)
> + free_irq(q->irq, q);
> +}
> +
> +static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data);
> +static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data);
> +static irqreturn_t riscv_iommu_fltq_irq_check(int irq, void *data);
> +static irqreturn_t riscv_iommu_fltq_process(int irq, void *data);
> +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data);
> +static irqreturn_t riscv_iommu_priq_process(int irq, void *data);
> +
> +static int riscv_iommu_queue_init(struct riscv_iommu_device *iommu, int queue_id)
> +{
> + struct device *dev = iommu->dev;
> + struct riscv_iommu_queue *q = NULL;
> + size_t queue_size = 0;
> + irq_handler_t irq_check;
> + irq_handler_t irq_process;
> + const char *name;
> + int count = 0;
> + int irq = 0;
> + unsigned order = 0;
> + u64 qbr_val = 0;
> + u64 qbr_readback = 0;
> + u64 qbr_paddr = 0;
> + int ret = 0;
> +
> + switch (queue_id) {
> + case RISCV_IOMMU_COMMAND_QUEUE:
> + q = &iommu->cmdq;
> + q->len = sizeof(struct riscv_iommu_command);
> + count = iommu->cmdq_len;
> + irq = iommu->irq_cmdq;
> + irq_check = riscv_iommu_cmdq_irq_check;
> + irq_process = riscv_iommu_cmdq_process;
> + q->qbr = RISCV_IOMMU_REG_CQB;
> + q->qcr = RISCV_IOMMU_REG_CQCSR;
> + name = "cmdq";
> + break;
> + case RISCV_IOMMU_FAULT_QUEUE:
> + q = &iommu->fltq;
> + q->len = sizeof(struct riscv_iommu_fq_record);
> + count = iommu->fltq_len;
> + irq = iommu->irq_fltq;
> + irq_check = riscv_iommu_fltq_irq_check;
> + irq_process = riscv_iommu_fltq_process;
> + q->qbr = RISCV_IOMMU_REG_FQB;
> + q->qcr = RISCV_IOMMU_REG_FQCSR;
> + name = "fltq";
> + break;
> + case RISCV_IOMMU_PAGE_REQUEST_QUEUE:
> + q = &iommu->priq;
> + q->len = sizeof(struct riscv_iommu_pq_record);
> + count = iommu->priq_len;
> + irq = iommu->irq_priq;
> + irq_check = riscv_iommu_priq_irq_check;
> + irq_process = riscv_iommu_priq_process;
> + q->qbr = RISCV_IOMMU_REG_PQB;
> + q->qcr = RISCV_IOMMU_REG_PQCSR;
> + name = "priq";
> + break;
> + default:
> + dev_err(dev, "invalid queue interrupt index in queue_init!\n");
> + return -EINVAL;
> + }
> +
> + /* Polling not implemented */
> + if (!irq)
> + return -ENODEV;
> +
> + /* Allocate queue in memory and set the base register */
> + order = ilog2(count);
> + do {
> + queue_size = q->len * (1ULL << order);
> + q->base = dmam_alloc_coherent(dev, queue_size, &q->base_dma, GFP_KERNEL);
> + if (q->base || queue_size < PAGE_SIZE)
> + break;
> +
> + order--;
> + } while (1);
> +
> + if (!q->base) {
> + dev_err(dev, "failed to allocate %s queue (cnt: %u)\n", name, count);
> + return -ENOMEM;
> + }
> +
> + q->cnt = 1ULL << order;
> +
> + qbr_val = phys_to_ppn(q->base_dma) |
> + FIELD_PREP(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, order - 1);
> +
> + riscv_iommu_writeq(iommu, q->qbr, qbr_val);
> +
> + /*
> + * Queue base registers are WARL, so it's possible that whatever we wrote
> + * there was illegal/not supported by the hw in which case we need to make
> + * sure we set a supported PPN and/or queue size.
> + */
> + qbr_readback = riscv_iommu_readq(iommu, q->qbr);
> + if (qbr_readback == qbr_val)
> + goto irq;
> +
> + dmam_free_coherent(dev, queue_size, q->base, q->base_dma);
> +
> + /* Get supported queue size */
> + order = FIELD_GET(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, qbr_readback) + 1;
> + q->cnt = 1ULL << order;
> + queue_size = q->len * q->cnt;
> +
> + /*
> + * In case we also failed to set PPN, it means the field is hardcoded and the
> + * queue resides in I/O memory instead, so get its physical address and
> + * ioremap it.
> + */
> + qbr_paddr = ppn_to_phys(qbr_readback);
> + if (qbr_paddr != q->base_dma) {
> + dev_info(dev,
> + "hardcoded ppn in %s base register, using io memory for the queue\n",
> + name);
> + dev_info(dev, "queue length for %s set to %i\n", name, q->cnt);
> + q->in_iomem = true;
> + q->base = ioremap(qbr_paddr, queue_size);
> + if (!q->base) {
> + dev_err(dev, "failed to map %s queue (cnt: %u)\n", name, q->cnt);
> + return -ENOMEM;
> + }
> + q->base_dma = qbr_paddr;
> + } else {
> + /*
> + * We only failed to set the queue size, re-try to allocate memory with
> + * the queue size supported by the hw.
> + */
> + dev_info(dev, "hardcoded queue size in %s base register\n", name);
> + dev_info(dev, "retrying with queue length: %i\n", q->cnt);
> + q->base = dmam_alloc_coherent(dev, queue_size, &q->base_dma, GFP_KERNEL);
> + if (!q->base) {
> + dev_err(dev, "failed to allocate %s queue (cnt: %u)\n",
> + name, q->cnt);
> + return -ENOMEM;
> + }
> + }
> +
> + qbr_val = phys_to_ppn(q->base_dma) |
> + FIELD_PREP(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, order - 1);
> + riscv_iommu_writeq(iommu, q->qbr, qbr_val);
> +
> + /* Final check to make sure hw accepted our write */
> + qbr_readback = riscv_iommu_readq(iommu, q->qbr);
> + if (qbr_readback != qbr_val) {
> + dev_err(dev, "failed to set base register for %s\n", name);
> + goto fail;
> + }
> +
> + irq:
> + if (request_threaded_irq(irq, irq_check, irq_process, IRQF_ONESHOT | IRQF_SHARED,
> + dev_name(dev), q)) {
> + dev_err(dev, "fail to request irq %d for %s\n", irq, name);
> + goto fail;
> + }
> +
> + q->irq = irq;
> +
> + /* Note: All RIO_xQ_EN/IE fields are in the same offsets */
> + ret =
> + riscv_iommu_queue_ctrl(iommu, q,
> + RISCV_IOMMU_QUEUE_ENABLE |
> + RISCV_IOMMU_QUEUE_INTR_ENABLE);
> + if (ret & RISCV_IOMMU_QUEUE_BUSY) {
> + dev_err(dev, "%s init timeout\n", name);
> + ret = -EBUSY;
> + goto fail;
> + }
> +
> + return 0;
> +
> + fail:
> + riscv_iommu_queue_free(iommu, q);
> + return 0;
> +}
> +
> +/*
> + * I/O MMU Command queue chapter 3.1
> + */
> +
> +static inline void riscv_iommu_cmd_inval_vma(struct riscv_iommu_command *cmd)
> +{
> + cmd->dword0 =
> + FIELD_PREP(RISCV_IOMMU_CMD_OPCODE,
> + RISCV_IOMMU_CMD_IOTINVAL_OPCODE) | FIELD_PREP(RISCV_IOMMU_CMD_FUNC,
> + RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA);
> + cmd->dword1 = 0;
> +}
> +
> +static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
> + u64 addr)
> +{
> + cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
> + cmd->dword1 = addr;
> +}
> +
> +static inline void riscv_iommu_cmd_inval_set_pscid(struct riscv_iommu_command *cmd,
> + unsigned pscid)
> +{
> + cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_PSCID, pscid) |
> + RISCV_IOMMU_CMD_IOTINVAL_PSCV;
> +}
> +
> +static inline void riscv_iommu_cmd_inval_set_gscid(struct riscv_iommu_command *cmd,
> + unsigned gscid)
> +{
> + cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_GSCID, gscid) |
> + RISCV_IOMMU_CMD_IOTINVAL_GV;
> +}
> +
> +static inline void riscv_iommu_cmd_iofence(struct riscv_iommu_command *cmd)
> +{
> + cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOFENCE_OPCODE) |
> + FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOFENCE_FUNC_C);
> + cmd->dword1 = 0;
> +}
> +
> +static inline void riscv_iommu_cmd_iofence_set_av(struct riscv_iommu_command *cmd,
> + u64 addr, u32 data)
> +{
> + cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOFENCE_OPCODE) |
> + FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOFENCE_FUNC_C) |
> + FIELD_PREP(RISCV_IOMMU_CMD_IOFENCE_DATA, data) | RISCV_IOMMU_CMD_IOFENCE_AV;
> + cmd->dword1 = (addr >> 2);
> +}
> +
> +static inline void riscv_iommu_cmd_iodir_inval_ddt(struct riscv_iommu_command *cmd)
> +{
> + cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IODIR_OPCODE) |
> + FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT);
> + cmd->dword1 = 0;
> +}
> +
> +static inline void riscv_iommu_cmd_iodir_inval_pdt(struct riscv_iommu_command *cmd)
> +{
> + cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IODIR_OPCODE) |
> + FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT);
> + cmd->dword1 = 0;
> +}
> +
> +static inline void riscv_iommu_cmd_iodir_set_did(struct riscv_iommu_command *cmd,
> + unsigned devid)
> +{
> + cmd->dword0 |=
> + FIELD_PREP(RISCV_IOMMU_CMD_IODIR_DID, devid) | RISCV_IOMMU_CMD_IODIR_DV;
> +}
> +
> +/* TODO: Convert into lock-less MPSC implementation. */
> +static bool riscv_iommu_post_sync(struct riscv_iommu_device *iommu,
> + struct riscv_iommu_command *cmd, bool sync)
> +{
> + u32 head, tail, next, last;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&iommu->cq_lock, flags);
> + head = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQH) & (iommu->cmdq.cnt - 1);
> + tail = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQT) & (iommu->cmdq.cnt - 1);
> + last = iommu->cmdq.lui;
> + if (tail != last) {
> + spin_unlock_irqrestore(&iommu->cq_lock, flags);
> + /*
> + * FIXME: This is a workaround for dropped MMIO writes/reads on QEMU platform.
> + * While debugging of the problem is still ongoing, this provides
> + * a simple impolementation of try-again policy.
> + * Will be changed to lock-less algorithm in the feature.
> + */
> + dev_dbg(iommu->dev, "IOMMU CQT: %x != %x (1st)\n", last, tail);
> + spin_lock_irqsave(&iommu->cq_lock, flags);
> + tail =
> + riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQT) & (iommu->cmdq.cnt - 1);
> + last = iommu->cmdq.lui;
> + if (tail != last) {
> + spin_unlock_irqrestore(&iommu->cq_lock, flags);
> + dev_dbg(iommu->dev, "IOMMU CQT: %x != %x (2nd)\n", last, tail);
> + spin_lock_irqsave(&iommu->cq_lock, flags);
> + }
> + }
> +
> + next = (last + 1) & (iommu->cmdq.cnt - 1);
> + if (next != head) {
> + struct riscv_iommu_command *ptr = iommu->cmdq.base;
> + ptr[last] = *cmd;
> + wmb();
> + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_CQT, next);
> + iommu->cmdq.lui = next;
> + }
> +
> + spin_unlock_irqrestore(&iommu->cq_lock, flags);
> +
> + if (sync && head != next) {
> + cycles_t start_time = get_cycles();
> + while (1) {
> + last = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQH) &
> + (iommu->cmdq.cnt - 1);
> + if (head < next && last >= next)
> + break;
> + if (head > next && last < head && last >= next)
> + break;
> + if (RISCV_IOMMU_TIMEOUT < (get_cycles() - start_time)) {

This condition will be imprecise, because here is not in irq disabled
context, it will be scheduled out or preempted. When we come back
here, it might be over 1 second, but the IOFENCE is actually
completed.

> + dev_err(iommu->dev, "IOFENCE TIMEOUT\n");
> + return false;
> + }
> + cpu_relax();
> + }
> + }
> +
> + return next != head;
> +}
> +
> +static bool riscv_iommu_post(struct riscv_iommu_device *iommu,
> + struct riscv_iommu_command *cmd)
> +{
> + return riscv_iommu_post_sync(iommu, cmd, false);
> +}
> +
> +static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
> +{
> + struct riscv_iommu_command cmd;
> + riscv_iommu_cmd_iofence(&cmd);
> + return riscv_iommu_post_sync(iommu, &cmd, true);
> +}
> +
> +/* Command queue primary interrupt handler */
> +static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data)
> +{
> + struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> + struct riscv_iommu_device *iommu =
> + container_of(q, struct riscv_iommu_device, cmdq);
> + u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> + if (ipsr & RISCV_IOMMU_IPSR_CIP)
> + return IRQ_WAKE_THREAD;
> + return IRQ_NONE;
> +}
> +
> +/* Command queue interrupt hanlder thread function */
> +static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data)
> +{
> + struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> + struct riscv_iommu_device *iommu;
> + unsigned ctrl;
> +
> + iommu = container_of(q, struct riscv_iommu_device, cmdq);
> +
> + /* Error reporting, clear error reports if any. */
> + ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQCSR);
> + if (ctrl & (RISCV_IOMMU_CQCSR_CQMF |
> + RISCV_IOMMU_CQCSR_CMD_TO | RISCV_IOMMU_CQCSR_CMD_ILL)) {
> + riscv_iommu_queue_ctrl(iommu, &iommu->cmdq, ctrl);
> + dev_warn_ratelimited(iommu->dev,
> + "Command queue error: fault: %d tout: %d err: %d\n",
> + !!(ctrl & RISCV_IOMMU_CQCSR_CQMF),
> + !!(ctrl & RISCV_IOMMU_CQCSR_CMD_TO),
> + !!(ctrl & RISCV_IOMMU_CQCSR_CMD_ILL));

We need to handle the error by either adjusting the tail to remove the
failed command or fixing the failed command itself. Otherwise, the
failed command will keep in the queue and IOMMU will try to execute
it. I guess the first option might be easier to implement.

> + }
> +
> + /* Clear fault interrupt pending. */
> + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_CIP);
> +
> + return IRQ_HANDLED;
> +}
> +
> +/*
> + * Fault/event queue, chapter 3.2
> + */
> +
> +static void riscv_iommu_fault_report(struct riscv_iommu_device *iommu,
> + struct riscv_iommu_fq_record *event)
> +{
> + unsigned err, devid;
> +
> + err = FIELD_GET(RISCV_IOMMU_FQ_HDR_CAUSE, event->hdr);
> + devid = FIELD_GET(RISCV_IOMMU_FQ_HDR_DID, event->hdr);
> +
> + dev_warn_ratelimited(iommu->dev,
> + "Fault %d devid: %d" " iotval: %llx iotval2: %llx\n", err,
> + devid, event->iotval, event->iotval2);
> +}
> +
> +/* Fault/event queue primary interrupt handler */
> +static irqreturn_t riscv_iommu_fltq_irq_check(int irq, void *data)
> +{
> + struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> + struct riscv_iommu_device *iommu =
> + container_of(q, struct riscv_iommu_device, fltq);
> + u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> + if (ipsr & RISCV_IOMMU_IPSR_FIP)
> + return IRQ_WAKE_THREAD;
> + return IRQ_NONE;
> +}
> +
> +/* Fault queue interrupt hanlder thread function */
> +static irqreturn_t riscv_iommu_fltq_process(int irq, void *data)
> +{
> + struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> + struct riscv_iommu_device *iommu;
> + struct riscv_iommu_fq_record *events;
> + unsigned cnt, len, idx, ctrl;
> +
> + iommu = container_of(q, struct riscv_iommu_device, fltq);
> + events = (struct riscv_iommu_fq_record *)q->base;
> +
> + /* Error reporting, clear error reports if any. */
> + ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FQCSR);
> + if (ctrl & (RISCV_IOMMU_FQCSR_FQMF | RISCV_IOMMU_FQCSR_FQOF)) {
> + riscv_iommu_queue_ctrl(iommu, &iommu->fltq, ctrl);
> + dev_warn_ratelimited(iommu->dev,
> + "Fault queue error: fault: %d full: %d\n",
> + !!(ctrl & RISCV_IOMMU_FQCSR_FQMF),
> + !!(ctrl & RISCV_IOMMU_FQCSR_FQOF));
> + }
> +
> + /* Clear fault interrupt pending. */
> + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_FIP);
> +
> + /* Report fault events. */
> + do {
> + cnt = riscv_iommu_queue_consume(iommu, q, &idx);
> + if (!cnt)
> + break;
> + for (len = 0; len < cnt; idx++, len++)
> + riscv_iommu_fault_report(iommu, &events[idx]);
> + riscv_iommu_queue_release(iommu, q, cnt);
> + } while (1);
> +
> + return IRQ_HANDLED;
> +}
> +
> +/*
> + * Page request queue, chapter 3.3
> + */
> +
> /*
> * Register device for IOMMU tracking.
> */
> @@ -97,6 +600,54 @@ static void riscv_iommu_add_device(struct riscv_iommu_device *iommu, struct devi
> mutex_unlock(&iommu->eps_mutex);
> }
>
> +/* Page request interface queue primary interrupt handler */
> +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data)
> +{
> + struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> + struct riscv_iommu_device *iommu =
> + container_of(q, struct riscv_iommu_device, priq);
> + u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> + if (ipsr & RISCV_IOMMU_IPSR_PIP)
> + return IRQ_WAKE_THREAD;
> + return IRQ_NONE;
> +}
> +
> +/* Page request interface queue interrupt hanlder thread function */
> +static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
> +{
> + struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> + struct riscv_iommu_device *iommu;
> + struct riscv_iommu_pq_record *requests;
> + unsigned cnt, idx, ctrl;
> +
> + iommu = container_of(q, struct riscv_iommu_device, priq);
> + requests = (struct riscv_iommu_pq_record *)q->base;
> +
> + /* Error reporting, clear error reports if any. */
> + ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_PQCSR);
> + if (ctrl & (RISCV_IOMMU_PQCSR_PQMF | RISCV_IOMMU_PQCSR_PQOF)) {
> + riscv_iommu_queue_ctrl(iommu, &iommu->priq, ctrl);
> + dev_warn_ratelimited(iommu->dev,
> + "Page request queue error: fault: %d full: %d\n",
> + !!(ctrl & RISCV_IOMMU_PQCSR_PQMF),
> + !!(ctrl & RISCV_IOMMU_PQCSR_PQOF));
> + }
> +
> + /* Clear page request interrupt pending. */
> + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_PIP);
> +
> + /* Process page requests. */
> + do {
> + cnt = riscv_iommu_queue_consume(iommu, q, &idx);
> + if (!cnt)
> + break;
> + dev_warn(iommu->dev, "unexpected %u page requests\n", cnt);
> + riscv_iommu_queue_release(iommu, q, cnt);
> + } while (1);
> +
> + return IRQ_HANDLED;
> +}
> +
> /*
> * Endpoint management
> */
> @@ -350,7 +901,29 @@ static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
> unsigned long *start, unsigned long *end,
> size_t *pgsize)
> {
> - /* Command interface not implemented */
> + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> + struct riscv_iommu_command cmd;
> + unsigned long iova;
> +
> + if (domain->mode == RISCV_IOMMU_DC_FSC_MODE_BARE)
> + return;
> +
> + /* Domain not attached to an IOMMU! */
> + BUG_ON(!domain->iommu);
> +
> + riscv_iommu_cmd_inval_vma(&cmd);
> + riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
> +
> + if (start && end && pgsize) {
> + /* Cover only the range that is needed */
> + for (iova = *start; iova <= *end; iova += *pgsize) {
> + riscv_iommu_cmd_inval_set_addr(&cmd, iova);
> + riscv_iommu_post(domain->iommu, &cmd);
> + }
> + } else {
> + riscv_iommu_post(domain->iommu, &cmd);
> + }
> + riscv_iommu_iofence_sync(domain->iommu);
> }
>
> static void riscv_iommu_flush_iotlb_all(struct iommu_domain *iommu_domain)
> @@ -610,6 +1183,9 @@ void riscv_iommu_remove(struct riscv_iommu_device *iommu)
> iommu_device_unregister(&iommu->iommu);
> iommu_device_sysfs_remove(&iommu->iommu);
> riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
> + riscv_iommu_queue_free(iommu, &iommu->cmdq);
> + riscv_iommu_queue_free(iommu, &iommu->fltq);
> + riscv_iommu_queue_free(iommu, &iommu->priq);
> }
>
> int riscv_iommu_init(struct riscv_iommu_device *iommu)
> @@ -632,6 +1208,16 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
> }
> #endif
>
> + /*
> + * Assign queue lengths from module parameters if not already
> + * set on the device tree.
> + */
> + if (!iommu->cmdq_len)
> + iommu->cmdq_len = cmdq_length;
> + if (!iommu->fltq_len)
> + iommu->fltq_len = fltq_length;
> + if (!iommu->priq_len)
> + iommu->priq_len = priq_length;
> /* Clear any pending interrupt flag. */
> riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR,
> RISCV_IOMMU_IPSR_CIP |
> @@ -639,7 +1225,20 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
> RISCV_IOMMU_IPSR_PMIP | RISCV_IOMMU_IPSR_PIP);
> spin_lock_init(&iommu->cq_lock);
> mutex_init(&iommu->eps_mutex);
> + ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_COMMAND_QUEUE);
> + if (ret)
> + goto fail;
> + ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_FAULT_QUEUE);
> + if (ret)
> + goto fail;
> + if (!(iommu->cap & RISCV_IOMMU_CAP_ATS))
> + goto no_ats;
> +
> + ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_PAGE_REQUEST_QUEUE);
> + if (ret)
> + goto fail;
>
> + no_ats:
> ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);
>
> if (ret) {
> @@ -663,5 +1262,8 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
> return 0;
> fail:
> riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
> + riscv_iommu_queue_free(iommu, &iommu->priq);
> + riscv_iommu_queue_free(iommu, &iommu->fltq);
> + riscv_iommu_queue_free(iommu, &iommu->cmdq);
> return ret;
> }
> diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> index 7dc9baa59a50..04148a2a8ffd 100644
> --- a/drivers/iommu/riscv/iommu.h
> +++ b/drivers/iommu/riscv/iommu.h
> @@ -28,6 +28,24 @@
> #define IOMMU_PAGE_SIZE_1G BIT_ULL(30)
> #define IOMMU_PAGE_SIZE_512G BIT_ULL(39)
>
> +struct riscv_iommu_queue {
> + dma_addr_t base_dma; /* ring buffer bus address */
> + void *base; /* ring buffer pointer */
> + size_t len; /* single item length */
> + u32 cnt; /* items count */
> + u32 lui; /* last used index, consumer/producer share */
> + unsigned qbr; /* queue base register offset */
> + unsigned qcr; /* queue control and status register offset */
> + int irq; /* registered interrupt number */
> + bool in_iomem; /* indicates queue data are in I/O memory */
> +};
> +
> +enum riscv_queue_ids {
> + RISCV_IOMMU_COMMAND_QUEUE = 0,
> + RISCV_IOMMU_FAULT_QUEUE = 1,
> + RISCV_IOMMU_PAGE_REQUEST_QUEUE = 2
> +};
> +
> struct riscv_iommu_device {
> struct iommu_device iommu; /* iommu core interface */
> struct device *dev; /* iommu hardware */
> @@ -42,6 +60,11 @@ struct riscv_iommu_device {
> int irq_pm;
> int irq_priq;
>
> + /* Queue lengths */
> + int cmdq_len;
> + int fltq_len;
> + int priq_len;
> +
> /* supported and enabled hardware capabilities */
> u64 cap;
>
> @@ -53,6 +76,11 @@ struct riscv_iommu_device {
> unsigned ddt_mode;
> bool ddtp_in_iomem;
>
> + /* hardware queues */
> + struct riscv_iommu_queue cmdq;
> + struct riscv_iommu_queue fltq;
> + struct riscv_iommu_queue priq;
> +
> /* Connected end-points */
> struct rb_root eps;
> struct mutex eps_mutex;
> --
> 2.34.1
>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-07-31 08:57:31

by Zong Li

[permalink] [raw]
Subject: Re: [PATCH 10/11] RISC-V: drivers/iommu/riscv: Add MSI identity remapping

On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <[email protected]> wrote:
>
> This change provides basic identity mapping support to
> excercise MSI_FLAT hardware capability.
>
> Signed-off-by: Tomasz Jeznach <[email protected]>
> ---
> drivers/iommu/riscv/iommu.c | 81 +++++++++++++++++++++++++++++++++++++
> drivers/iommu/riscv/iommu.h | 3 ++
> 2 files changed, 84 insertions(+)
>
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 6042c35be3ca..7b3e3e135cf6 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -61,6 +61,9 @@ MODULE_PARM_DESC(priq_length, "Page request interface queue length.");
> #define RISCV_IOMMU_MAX_PSCID (1U << 20)
> static DEFINE_IDA(riscv_iommu_pscids);
>
> +/* TODO: Enable MSI remapping */
> +#define RISCV_IMSIC_BASE 0x28000000

I'm not sure if it is appropriate to hard code the base address of
peripheral in source code, it might be depends on the layout of each
target.

> +
> /* 1 second */
> #define RISCV_IOMMU_TIMEOUT riscv_timebase
>
> @@ -932,6 +935,72 @@ static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
> * Endpoint management
> */
>
> +static int riscv_iommu_enable_ir(struct riscv_iommu_endpoint *ep)
> +{
> + struct riscv_iommu_device *iommu = ep->iommu;
> + struct iommu_resv_region *entry;
> + struct irq_domain *msi_domain;
> + u64 val;
> + int i;
> +
> + /* Initialize MSI remapping */
> + if (!ep->dc || !(iommu->cap & RISCV_IOMMU_CAP_MSI_FLAT))
> + return 0;
> +
> + ep->msi_root = (struct riscv_iommu_msi_pte *)get_zeroed_page(GFP_KERNEL);
> + if (!ep->msi_root)
> + return -ENOMEM;
> +
> + for (i = 0; i < 256; i++) {
> + ep->msi_root[i].pte = RISCV_IOMMU_MSI_PTE_V |
> + FIELD_PREP(RISCV_IOMMU_MSI_PTE_M, 3) |
> + phys_to_ppn(RISCV_IMSIC_BASE + i * PAGE_SIZE);
> + }
> +
> + entry = iommu_alloc_resv_region(RISCV_IMSIC_BASE, PAGE_SIZE * 256, 0,
> + IOMMU_RESV_SW_MSI, GFP_KERNEL);
> + if (entry)
> + list_add_tail(&entry->list, &ep->regions);
> +
> + val = virt_to_pfn(ep->msi_root) |
> + FIELD_PREP(RISCV_IOMMU_DC_MSIPTP_MODE, RISCV_IOMMU_DC_MSIPTP_MODE_FLAT);
> + ep->dc->msiptp = cpu_to_le64(val);
> +
> + /* Single page of MSIPTP, 256 IMSIC files */
> + ep->dc->msi_addr_mask = cpu_to_le64(255);
> + ep->dc->msi_addr_pattern = cpu_to_le64(RISCV_IMSIC_BASE >> 12);
> + wmb();
> +
> + /* set msi domain for the device as isolated. hack. */
> + msi_domain = dev_get_msi_domain(ep->dev);
> + if (msi_domain) {
> + msi_domain->flags |= IRQ_DOMAIN_FLAG_ISOLATED_MSI;
> + }
> +
> + dev_dbg(ep->dev, "RV-IR enabled\n");
> +
> + ep->ir_enabled = true;
> +
> + return 0;
> +}
> +
> +static void riscv_iommu_disable_ir(struct riscv_iommu_endpoint *ep)
> +{
> + if (!ep->ir_enabled)
> + return;
> +
> + ep->dc->msi_addr_pattern = 0ULL;
> + ep->dc->msi_addr_mask = 0ULL;
> + ep->dc->msiptp = 0ULL;
> + wmb();
> +
> + dev_dbg(ep->dev, "RV-IR disabled\n");
> +
> + free_pages((unsigned long)ep->msi_root, 0);
> + ep->msi_root = NULL;
> + ep->ir_enabled = false;
> +}
> +
> /* Endpoint features/capabilities */
> static void riscv_iommu_disable_ep(struct riscv_iommu_endpoint *ep)
> {
> @@ -1226,6 +1295,7 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
>
> mutex_init(&ep->lock);
> INIT_LIST_HEAD(&ep->domain);
> + INIT_LIST_HEAD(&ep->regions);
>
> if (dev_is_pci(dev)) {
> ep->devid = pci_dev_id(to_pci_dev(dev));
> @@ -1248,6 +1318,7 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
> dev_iommu_priv_set(dev, ep);
> riscv_iommu_add_device(iommu, dev);
> riscv_iommu_enable_ep(ep);
> + riscv_iommu_enable_ir(ep);
>
> return &iommu->iommu;
> }
> @@ -1279,6 +1350,7 @@ static void riscv_iommu_release_device(struct device *dev)
> riscv_iommu_iodir_inv_devid(iommu, ep->devid);
> }
>
> + riscv_iommu_disable_ir(ep);
> riscv_iommu_disable_ep(ep);
>
> /* Remove endpoint from IOMMU tracking structures */
> @@ -1301,6 +1373,15 @@ static struct iommu_group *riscv_iommu_device_group(struct device *dev)
>
> static void riscv_iommu_get_resv_regions(struct device *dev, struct list_head *head)
> {
> + struct iommu_resv_region *entry, *new_entry;
> + struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> +
> + list_for_each_entry(entry, &ep->regions, list) {
> + new_entry = kmemdup(entry, sizeof(*entry), GFP_KERNEL);
> + if (new_entry)
> + list_add_tail(&new_entry->list, head);
> + }
> +
> iommu_dma_get_resv_regions(dev, head);
> }
>
> diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> index 83e8d00fd0f8..55418a1144fb 100644
> --- a/drivers/iommu/riscv/iommu.h
> +++ b/drivers/iommu/riscv/iommu.h
> @@ -117,14 +117,17 @@ struct riscv_iommu_endpoint {
> struct riscv_iommu_dc *dc; /* device context pointer */
> struct riscv_iommu_pc *pc; /* process context root, valid if pasid_enabled is true */
> struct riscv_iommu_device *iommu; /* parent iommu device */
> + struct riscv_iommu_msi_pte *msi_root; /* interrupt re-mapping */
>
> struct mutex lock;
> struct list_head domain; /* endpoint attached managed domain */
> + struct list_head regions; /* reserved regions, interrupt remapping window */
>
> /* end point info bits */
> unsigned pasid_bits;
> unsigned pasid_feat;
> bool pasid_enabled;
> + bool ir_enabled;
> };
>
> /* Helper functions and macros */
> --
> 2.34.1
>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-07-31 09:40:23

by Zong Li

[permalink] [raw]
Subject: Re: [PATCH 09/11] RISC-V: drivers/iommu/riscv: Add SVA with PASID/ATS/PRI support.

On Thu, Jul 20, 2023 at 3:35 AM Tomasz Jeznach <[email protected]> wrote:
>
> Introduces SVA (Shared Virtual Address) for RISC-V IOMMU, with
> ATS/PRI services for capable devices.
>
> Co-developed-by: Sebastien Boeuf <[email protected]>
> Signed-off-by: Sebastien Boeuf <[email protected]>
> Signed-off-by: Tomasz Jeznach <[email protected]>
> ---
> drivers/iommu/riscv/iommu.c | 601 +++++++++++++++++++++++++++++++++++-
> drivers/iommu/riscv/iommu.h | 14 +
> 2 files changed, 610 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 2ef6952a2109..6042c35be3ca 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -384,6 +384,89 @@ static inline void riscv_iommu_cmd_iodir_set_did(struct riscv_iommu_command *cmd
> FIELD_PREP(RISCV_IOMMU_CMD_IODIR_DID, devid) | RISCV_IOMMU_CMD_IODIR_DV;
> }
>
> +static inline void riscv_iommu_cmd_iodir_set_pid(struct riscv_iommu_command *cmd,
> + unsigned pasid)
> +{
> + cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IODIR_PID, pasid);
> +}
> +
> +static void riscv_iommu_cmd_ats_inval(struct riscv_iommu_command *cmd)
> +{
> + cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_ATS_OPCODE) |
> + FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_ATS_FUNC_INVAL);
> + cmd->dword1 = 0;
> +}
> +
> +static inline void riscv_iommu_cmd_ats_prgr(struct riscv_iommu_command *cmd)
> +{
> + cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_ATS_OPCODE) |
> + FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_ATS_FUNC_PRGR);
> + cmd->dword1 = 0;
> +}
> +
> +static void riscv_iommu_cmd_ats_set_rid(struct riscv_iommu_command *cmd, u32 rid)
> +{
> + cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_ATS_RID, rid);
> +}
> +
> +static void riscv_iommu_cmd_ats_set_pid(struct riscv_iommu_command *cmd, u32 pid)
> +{
> + cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_ATS_PID, pid) | RISCV_IOMMU_CMD_ATS_PV;
> +}
> +
> +static void riscv_iommu_cmd_ats_set_dseg(struct riscv_iommu_command *cmd, u8 seg)
> +{
> + cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_ATS_DSEG, seg) | RISCV_IOMMU_CMD_ATS_DSV;
> +}
> +
> +static void riscv_iommu_cmd_ats_set_payload(struct riscv_iommu_command *cmd, u64 payload)
> +{
> + cmd->dword1 = payload;
> +}
> +
> +/* Prepare the ATS invalidation payload */
> +static unsigned long riscv_iommu_ats_inval_payload(unsigned long start,
> + unsigned long end, bool global_inv)
> +{
> + size_t len = end - start + 1;
> + unsigned long payload = 0;
> +
> + /*
> + * PCI Express specification
> + * Section 10.2.3.2 Translation Range Size (S) Field
> + */
> + if (len < PAGE_SIZE)
> + len = PAGE_SIZE;
> + else
> + len = __roundup_pow_of_two(len);
> +
> + payload = (start & ~(len - 1)) | (((len - 1) >> 12) << 11);
> +
> + if (global_inv)
> + payload |= RISCV_IOMMU_CMD_ATS_INVAL_G;
> +
> + return payload;
> +}
> +
> +/* Prepare the ATS invalidation payload for all translations to be invalidated. */
> +static unsigned long riscv_iommu_ats_inval_all_payload(bool global_inv)
> +{
> + unsigned long payload = GENMASK_ULL(62, 11);
> +
> + if (global_inv)
> + payload |= RISCV_IOMMU_CMD_ATS_INVAL_G;
> +
> + return payload;
> +}
> +
> +/* Prepare the ATS "Page Request Group Response" payload */
> +static unsigned long riscv_iommu_ats_prgr_payload(u16 dest_id, u8 resp_code, u16 grp_idx)
> +{
> + return FIELD_PREP(RISCV_IOMMU_CMD_ATS_PRGR_DST_ID, dest_id) |
> + FIELD_PREP(RISCV_IOMMU_CMD_ATS_PRGR_RESP_CODE, resp_code) |
> + FIELD_PREP(RISCV_IOMMU_CMD_ATS_PRGR_PRG_INDEX, grp_idx);
> +}
> +
> /* TODO: Convert into lock-less MPSC implementation. */
> static bool riscv_iommu_post_sync(struct riscv_iommu_device *iommu,
> struct riscv_iommu_command *cmd, bool sync)
> @@ -460,6 +543,16 @@ static bool riscv_iommu_iodir_inv_devid(struct riscv_iommu_device *iommu, unsign
> return riscv_iommu_post(iommu, &cmd);
> }
>
> +static bool riscv_iommu_iodir_inv_pasid(struct riscv_iommu_device *iommu,
> + unsigned devid, unsigned pasid)
> +{
> + struct riscv_iommu_command cmd;
> + riscv_iommu_cmd_iodir_inval_pdt(&cmd);
> + riscv_iommu_cmd_iodir_set_did(&cmd, devid);
> + riscv_iommu_cmd_iodir_set_pid(&cmd, pasid);
> + return riscv_iommu_post(iommu, &cmd);
> +}
> +
> static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
> {
> struct riscv_iommu_command cmd;
> @@ -467,6 +560,62 @@ static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
> return riscv_iommu_post_sync(iommu, &cmd, true);
> }
>
> +static void riscv_iommu_mm_invalidate(struct mmu_notifier *mn,
> + struct mm_struct *mm, unsigned long start,
> + unsigned long end)
> +{
> + struct riscv_iommu_command cmd;
> + struct riscv_iommu_endpoint *endpoint;
> + struct riscv_iommu_domain *domain =
> + container_of(mn, struct riscv_iommu_domain, mn);
> + unsigned long iova;
> + /*
> + * The mm_types defines vm_end as the first byte after the end address,
> + * different from IOMMU subsystem using the last address of an address
> + * range. So do a simple translation here by updating what end means.
> + */
> + unsigned long payload = riscv_iommu_ats_inval_payload(start, end - 1, true);
> +
> + riscv_iommu_cmd_inval_vma(&cmd);
> + riscv_iommu_cmd_inval_set_gscid(&cmd, 0);
> + riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
> + if (end > start) {
> + /* Cover only the range that is needed */
> + for (iova = start; iova < end; iova += PAGE_SIZE) {
> + riscv_iommu_cmd_inval_set_addr(&cmd, iova);
> + riscv_iommu_post(domain->iommu, &cmd);
> + }
> + } else {
> + riscv_iommu_post(domain->iommu, &cmd);
> + }
> +
> + riscv_iommu_iofence_sync(domain->iommu);
> +
> + /* ATS invalidation for every device and for specific translation range. */
> + list_for_each_entry(endpoint, &domain->endpoints, domain) {
> + if (!endpoint->pasid_enabled)
> + continue;
> +
> + riscv_iommu_cmd_ats_inval(&cmd);
> + riscv_iommu_cmd_ats_set_dseg(&cmd, endpoint->domid);
> + riscv_iommu_cmd_ats_set_rid(&cmd, endpoint->devid);
> + riscv_iommu_cmd_ats_set_pid(&cmd, domain->pasid);
> + riscv_iommu_cmd_ats_set_payload(&cmd, payload);
> + riscv_iommu_post(domain->iommu, &cmd);
> + }
> + riscv_iommu_iofence_sync(domain->iommu);
> +}
> +
> +static void riscv_iommu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
> +{
> + /* TODO: removed from notifier, cleanup PSCID mapping, flush IOTLB */
> +}
> +
> +static const struct mmu_notifier_ops riscv_iommu_mmuops = {
> + .release = riscv_iommu_mm_release,
> + .invalidate_range = riscv_iommu_mm_invalidate,
> +};
> +
> /* Command queue primary interrupt handler */
> static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data)
> {
> @@ -608,6 +757,128 @@ static void riscv_iommu_add_device(struct riscv_iommu_device *iommu, struct devi
> mutex_unlock(&iommu->eps_mutex);
> }
>
> +/*
> + * Get device reference based on device identifier (requester id).
> + * Decrement reference count with put_device() call.
> + */
> +static struct device *riscv_iommu_get_device(struct riscv_iommu_device *iommu,
> + unsigned devid)
> +{
> + struct rb_node *node;
> + struct riscv_iommu_endpoint *ep;
> + struct device *dev = NULL;
> +
> + mutex_lock(&iommu->eps_mutex);
> +
> + node = iommu->eps.rb_node;
> + while (node && !dev) {
> + ep = rb_entry(node, struct riscv_iommu_endpoint, node);
> + if (ep->devid < devid)
> + node = node->rb_right;
> + else if (ep->devid > devid)
> + node = node->rb_left;
> + else
> + dev = get_device(ep->dev);
> + }
> +
> + mutex_unlock(&iommu->eps_mutex);
> +
> + return dev;
> +}
> +
> +static int riscv_iommu_ats_prgr(struct device *dev, struct iommu_page_response *msg)
> +{
> + struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> + struct riscv_iommu_command cmd;
> + u8 resp_code;
> + unsigned long payload;
> +
> + switch (msg->code) {
> + case IOMMU_PAGE_RESP_SUCCESS:
> + resp_code = 0b0000;
> + break;
> + case IOMMU_PAGE_RESP_INVALID:
> + resp_code = 0b0001;
> + break;
> + case IOMMU_PAGE_RESP_FAILURE:
> + resp_code = 0b1111;
> + break;
> + }
> + payload = riscv_iommu_ats_prgr_payload(ep->devid, resp_code, msg->grpid);
> +
> + /* ATS Page Request Group Response */
> + riscv_iommu_cmd_ats_prgr(&cmd);
> + riscv_iommu_cmd_ats_set_dseg(&cmd, ep->domid);
> + riscv_iommu_cmd_ats_set_rid(&cmd, ep->devid);
> + if (msg->flags & IOMMU_PAGE_RESP_PASID_VALID)
> + riscv_iommu_cmd_ats_set_pid(&cmd, msg->pasid);
> + riscv_iommu_cmd_ats_set_payload(&cmd, payload);
> + riscv_iommu_post(ep->iommu, &cmd);
> +
> + return 0;
> +}
> +
> +static void riscv_iommu_page_request(struct riscv_iommu_device *iommu,
> + struct riscv_iommu_pq_record *req)
> +{
> + struct iommu_fault_event event = { 0 };
> + struct iommu_fault_page_request *prm = &event.fault.prm;
> + int ret;
> + struct device *dev;
> + unsigned devid = FIELD_GET(RISCV_IOMMU_PREQ_HDR_DID, req->hdr);
> +
> + /* Ignore PGR Stop marker. */
> + if ((req->payload & RISCV_IOMMU_PREQ_PAYLOAD_M) == RISCV_IOMMU_PREQ_PAYLOAD_L)
> + return;
> +
> + dev = riscv_iommu_get_device(iommu, devid);
> + if (!dev) {
> + /* TODO: Handle invalid page request */
> + return;
> + }
> +
> + event.fault.type = IOMMU_FAULT_PAGE_REQ;
> +
> + if (req->payload & RISCV_IOMMU_PREQ_PAYLOAD_L)
> + prm->flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
> + if (req->payload & RISCV_IOMMU_PREQ_PAYLOAD_W)
> + prm->perm |= IOMMU_FAULT_PERM_WRITE;
> + if (req->payload & RISCV_IOMMU_PREQ_PAYLOAD_R)
> + prm->perm |= IOMMU_FAULT_PERM_READ;
> +
> + prm->grpid = FIELD_GET(RISCV_IOMMU_PREQ_PRG_INDEX, req->payload);
> + prm->addr = FIELD_GET(RISCV_IOMMU_PREQ_UADDR, req->payload) << PAGE_SHIFT;
> +
> + if (req->hdr & RISCV_IOMMU_PREQ_HDR_PV) {
> + prm->flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
> + /* TODO: where to find this bit */
> + prm->flags |= IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID;
> + prm->pasid = FIELD_GET(RISCV_IOMMU_PREQ_HDR_PID, req->hdr);
> + }
> +
> + ret = iommu_report_device_fault(dev, &event);
> + if (ret) {
> + struct iommu_page_response resp = {
> + .grpid = prm->grpid,
> + .code = IOMMU_PAGE_RESP_FAILURE,
> + };
> + if (prm->flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID) {
> + resp.flags |= IOMMU_PAGE_RESP_PASID_VALID;
> + resp.pasid = prm->pasid;
> + }
> + riscv_iommu_ats_prgr(dev, &resp);
> + }
> +
> + put_device(dev);
> +}
> +
> +static int riscv_iommu_page_response(struct device *dev,
> + struct iommu_fault_event *evt,
> + struct iommu_page_response *msg)
> +{
> + return riscv_iommu_ats_prgr(dev, msg);
> +}
> +
> /* Page request interface queue primary interrupt handler */
> static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data)
> {
> @@ -626,7 +897,7 @@ static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
> struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> struct riscv_iommu_device *iommu;
> struct riscv_iommu_pq_record *requests;
> - unsigned cnt, idx, ctrl;
> + unsigned cnt, len, idx, ctrl;
>
> iommu = container_of(q, struct riscv_iommu_device, priq);
> requests = (struct riscv_iommu_pq_record *)q->base;
> @@ -649,7 +920,8 @@ static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
> cnt = riscv_iommu_queue_consume(iommu, q, &idx);
> if (!cnt)
> break;
> - dev_warn(iommu->dev, "unexpected %u page requests\n", cnt);
> + for (len = 0; len < cnt; idx++, len++)
> + riscv_iommu_page_request(iommu, &requests[idx]);
> riscv_iommu_queue_release(iommu, q, cnt);
> } while (1);
>
> @@ -660,6 +932,169 @@ static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
> * Endpoint management
> */
>
> +/* Endpoint features/capabilities */
> +static void riscv_iommu_disable_ep(struct riscv_iommu_endpoint *ep)
> +{
> + struct pci_dev *pdev;
> +
> + if (!dev_is_pci(ep->dev))
> + return;
> +
> + pdev = to_pci_dev(ep->dev);
> +
> + if (ep->pasid_enabled) {
> + pci_disable_ats(pdev);
> + pci_disable_pri(pdev);
> + pci_disable_pasid(pdev);
> + ep->pasid_enabled = false;
> + }
> +}
> +
> +static void riscv_iommu_enable_ep(struct riscv_iommu_endpoint *ep)
> +{
> + int rc, feat, num;
> + struct pci_dev *pdev;
> + struct device *dev = ep->dev;
> +
> + if (!dev_is_pci(dev))
> + return;
> +
> + if (!ep->iommu->iommu.max_pasids)
> + return;
> +
> + pdev = to_pci_dev(dev);
> +
> + if (!pci_ats_supported(pdev))
> + return;
> +
> + if (!pci_pri_supported(pdev))
> + return;
> +
> + feat = pci_pasid_features(pdev);
> + if (feat < 0)
> + return;
> +
> + num = pci_max_pasids(pdev);
> + if (!num) {
> + dev_warn(dev, "Can't enable PASID (num: %d)\n", num);
> + return;
> + }
> +
> + if (num > ep->iommu->iommu.max_pasids)
> + num = ep->iommu->iommu.max_pasids;
> +
> + rc = pci_enable_pasid(pdev, feat);
> + if (rc) {
> + dev_warn(dev, "Can't enable PASID (rc: %d)\n", rc);
> + return;
> + }
> +
> + rc = pci_reset_pri(pdev);
> + if (rc) {
> + dev_warn(dev, "Can't reset PRI (rc: %d)\n", rc);
> + pci_disable_pasid(pdev);
> + return;
> + }
> +
> + /* TODO: Get supported PRI queue length, hard-code to 32 entries */
> + rc = pci_enable_pri(pdev, 32);
> + if (rc) {
> + dev_warn(dev, "Can't enable PRI (rc: %d)\n", rc);
> + pci_disable_pasid(pdev);
> + return;
> + }
> +
> + rc = pci_enable_ats(pdev, PAGE_SHIFT);
> + if (rc) {
> + dev_warn(dev, "Can't enable ATS (rc: %d)\n", rc);
> + pci_disable_pri(pdev);
> + pci_disable_pasid(pdev);
> + return;
> + }
> +
> + ep->pc = (struct riscv_iommu_pc *)get_zeroed_page(GFP_KERNEL);
> + if (!ep->pc) {
> + pci_disable_ats(pdev);
> + pci_disable_pri(pdev);
> + pci_disable_pasid(pdev);
> + return;
> + }
> +
> + ep->pasid_enabled = true;
> + ep->pasid_feat = feat;
> + ep->pasid_bits = ilog2(num);
> +
> + dev_dbg(ep->dev, "PASID/ATS support enabled, %d bits\n", ep->pasid_bits);
> +}
> +
> +static int riscv_iommu_enable_sva(struct device *dev)
> +{
> + int ret;
> + struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> +
> + if (!ep || !ep->iommu || !ep->iommu->pq_work)
> + return -EINVAL;
> +
> + if (!ep->pasid_enabled)
> + return -ENODEV;
> +
> + ret = iopf_queue_add_device(ep->iommu->pq_work, dev);
> + if (ret)
> + return ret;
> +
> + return iommu_register_device_fault_handler(dev, iommu_queue_iopf, dev);
> +}
> +
> +static int riscv_iommu_disable_sva(struct device *dev)
> +{
> + int ret;
> + struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> +
> + ret = iommu_unregister_device_fault_handler(dev);
> + if (!ret)
> + ret = iopf_queue_remove_device(ep->iommu->pq_work, dev);
> +
> + return ret;
> +}
> +
> +static int riscv_iommu_enable_iopf(struct device *dev)
> +{
> + struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> +
> + if (ep && ep->pasid_enabled)
> + return 0;
> +
> + return -EINVAL;
> +}
> +
> +static int riscv_iommu_dev_enable_feat(struct device *dev, enum iommu_dev_features feat)
> +{
> + switch (feat) {
> + case IOMMU_DEV_FEAT_IOPF:
> + return riscv_iommu_enable_iopf(dev);
> +
> + case IOMMU_DEV_FEAT_SVA:
> + return riscv_iommu_enable_sva(dev);
> +
> + default:
> + return -ENODEV;
> + }
> +}
> +
> +static int riscv_iommu_dev_disable_feat(struct device *dev, enum iommu_dev_features feat)
> +{
> + switch (feat) {
> + case IOMMU_DEV_FEAT_IOPF:
> + return 0;
> +
> + case IOMMU_DEV_FEAT_SVA:
> + return riscv_iommu_disable_sva(dev);
> +
> + default:
> + return -ENODEV;
> + }
> +}
> +
> static int riscv_iommu_of_xlate(struct device *dev, struct of_phandle_args *args)
> {
> return iommu_fwspec_add_ids(dev, args->args, 1);
> @@ -812,6 +1247,7 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
>
> dev_iommu_priv_set(dev, ep);
> riscv_iommu_add_device(iommu, dev);
> + riscv_iommu_enable_ep(ep);
>
> return &iommu->iommu;
> }
> @@ -843,6 +1279,8 @@ static void riscv_iommu_release_device(struct device *dev)
> riscv_iommu_iodir_inv_devid(iommu, ep->devid);
> }
>
> + riscv_iommu_disable_ep(ep);
> +
> /* Remove endpoint from IOMMU tracking structures */
> mutex_lock(&iommu->eps_mutex);
> rb_erase(&ep->node, &iommu->eps);
> @@ -878,7 +1316,8 @@ static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
> type != IOMMU_DOMAIN_DMA_FQ &&
> type != IOMMU_DOMAIN_UNMANAGED &&
> type != IOMMU_DOMAIN_IDENTITY &&
> - type != IOMMU_DOMAIN_BLOCKED)
> + type != IOMMU_DOMAIN_BLOCKED &&
> + type != IOMMU_DOMAIN_SVA)
> return NULL;
>
> domain = kzalloc(sizeof(*domain), GFP_KERNEL);
> @@ -906,6 +1345,9 @@ static void riscv_iommu_domain_free(struct iommu_domain *iommu_domain)
> pr_warn("IOMMU domain is not empty!\n");
> }
>
> + if (domain->mn.ops && iommu_domain->mm)
> + mmu_notifier_unregister(&domain->mn, iommu_domain->mm);
> +
> if (domain->pgtbl.cookie)
> free_io_pgtable_ops(&domain->pgtbl.ops);
>
> @@ -1023,14 +1465,29 @@ static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct devi
> */
> val = FIELD_PREP(RISCV_IOMMU_DC_TA_PSCID, domain->pscid);
>
> - dc->ta = cpu_to_le64(val);
> - dc->fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
> + if (ep->pasid_enabled) {
> + ep->pc[0].ta = cpu_to_le64(val | RISCV_IOMMU_PC_TA_V);
> + ep->pc[0].fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
> + dc->ta = 0;
> + dc->fsc = cpu_to_le64(virt_to_pfn(ep->pc) |
> + FIELD_PREP(RISCV_IOMMU_DC_FSC_MODE, RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8));

Could I know why we determinate to use PD8 directly? Rather than PD17 or PD20.

> + } else {
> + dc->ta = cpu_to_le64(val);
> + dc->fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
> + }
>
> wmb();
>
> /* Mark device context as valid, synchronise device context cache. */
> val = RISCV_IOMMU_DC_TC_V;
>
> + if (ep->pasid_enabled) {
> + val |= RISCV_IOMMU_DC_TC_EN_ATS |
> + RISCV_IOMMU_DC_TC_EN_PRI |
> + RISCV_IOMMU_DC_TC_DPE |
> + RISCV_IOMMU_DC_TC_PDTV;
> + }
> +
> if (ep->iommu->cap & RISCV_IOMMU_CAP_AMO) {
> val |= RISCV_IOMMU_DC_TC_GADE |
> RISCV_IOMMU_DC_TC_SADE;
> @@ -1051,13 +1508,107 @@ static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct devi
> return 0;
> }
>
> +static int riscv_iommu_set_dev_pasid(struct iommu_domain *iommu_domain,
> + struct device *dev, ioasid_t pasid)
> +{
> + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> + struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> + u64 ta, fsc;
> +
> + if (!iommu_domain || !iommu_domain->mm)
> + return -EINVAL;
> +
> + /* Driver uses TC.DPE mode, PASID #0 is incorrect. */
> + if (pasid == 0)
> + return -EINVAL;
> +
> + /* Incorrect domain identifier */
> + if ((int)domain->pscid < 0)
> + return -ENOMEM;
> +
> + /* Process Context table should be set for pasid enabled endpoints. */
> + if (!ep || !ep->pasid_enabled || !ep->dc || !ep->pc)
> + return -ENODEV;
> +
> + domain->pasid = pasid;
> + domain->iommu = ep->iommu;
> + domain->mn.ops = &riscv_iommu_mmuops;
> +
> + /* register mm notifier */
> + if (mmu_notifier_register(&domain->mn, iommu_domain->mm))
> + return -ENODEV;
> +
> + /* TODO: get SXL value for the process, use 32 bit or SATP mode */
> + fsc = virt_to_pfn(iommu_domain->mm->pgd) | satp_mode;
> + ta = RISCV_IOMMU_PC_TA_V | FIELD_PREP(RISCV_IOMMU_PC_TA_PSCID, domain->pscid);
> +
> + fsc = le64_to_cpu(xchg_relaxed(&(ep->pc[pasid].fsc), cpu_to_le64(fsc)));
> + ta = le64_to_cpu(xchg_relaxed(&(ep->pc[pasid].ta), cpu_to_le64(ta)));
> +
> + wmb();
> +
> + if (ta & RISCV_IOMMU_PC_TA_V) {
> + riscv_iommu_iodir_inv_pasid(ep->iommu, ep->devid, pasid);
> + riscv_iommu_iofence_sync(ep->iommu);
> + }
> +
> + dev_info(dev, "domain type %d attached w/ PSCID %u PASID %u\n",
> + domain->domain.type, domain->pscid, domain->pasid);
> +
> + return 0;
> +}
> +
> +static void riscv_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
> +{
> + struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> + struct riscv_iommu_command cmd;
> + unsigned long payload = riscv_iommu_ats_inval_all_payload(false);
> + u64 ta;
> +
> + /* invalidate TA.V */
> + ta = le64_to_cpu(xchg_relaxed(&(ep->pc[pasid].ta), 0));
> +
> + wmb();
> +
> + dev_info(dev, "domain removed w/ PSCID %u PASID %u\n",
> + (unsigned)FIELD_GET(RISCV_IOMMU_PC_TA_PSCID, ta), pasid);
> +
> + /* 1. invalidate PDT entry */
> + riscv_iommu_iodir_inv_pasid(ep->iommu, ep->devid, pasid);
> +
> + /* 2. invalidate all matching IOATC entries (if PASID was valid) */
> + if (ta & RISCV_IOMMU_PC_TA_V) {
> + riscv_iommu_cmd_inval_vma(&cmd);
> + riscv_iommu_cmd_inval_set_gscid(&cmd, 0);
> + riscv_iommu_cmd_inval_set_pscid(&cmd,
> + FIELD_GET(RISCV_IOMMU_PC_TA_PSCID, ta));
> + riscv_iommu_post(ep->iommu, &cmd);
> + }
> +
> + /* 3. Wait IOATC flush to happen */
> + riscv_iommu_iofence_sync(ep->iommu);
> +
> + /* 4. ATS invalidation */
> + riscv_iommu_cmd_ats_inval(&cmd);
> + riscv_iommu_cmd_ats_set_dseg(&cmd, ep->domid);
> + riscv_iommu_cmd_ats_set_rid(&cmd, ep->devid);
> + riscv_iommu_cmd_ats_set_pid(&cmd, pasid);
> + riscv_iommu_cmd_ats_set_payload(&cmd, payload);
> + riscv_iommu_post(ep->iommu, &cmd);
> +
> + /* 5. Wait DevATC flush to happen */
> + riscv_iommu_iofence_sync(ep->iommu);
> +}
> +
> static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
> unsigned long *start, unsigned long *end,
> size_t *pgsize)
> {
> struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> struct riscv_iommu_command cmd;
> + struct riscv_iommu_endpoint *endpoint;
> unsigned long iova;
> + unsigned long payload;
>
> if (domain->mode == RISCV_IOMMU_DC_FSC_MODE_BARE)
> return;
> @@ -1065,6 +1616,12 @@ static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
> /* Domain not attached to an IOMMU! */
> BUG_ON(!domain->iommu);
>
> + if (start && end) {
> + payload = riscv_iommu_ats_inval_payload(*start, *end, true);
> + } else {
> + payload = riscv_iommu_ats_inval_all_payload(true);
> + }
> +
> riscv_iommu_cmd_inval_vma(&cmd);
> riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
>
> @@ -1078,6 +1635,20 @@ static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
> riscv_iommu_post(domain->iommu, &cmd);
> }
> riscv_iommu_iofence_sync(domain->iommu);
> +
> + /* ATS invalidation for every device and for every translation */
> + list_for_each_entry(endpoint, &domain->endpoints, domain) {
> + if (!endpoint->pasid_enabled)
> + continue;
> +
> + riscv_iommu_cmd_ats_inval(&cmd);
> + riscv_iommu_cmd_ats_set_dseg(&cmd, endpoint->domid);
> + riscv_iommu_cmd_ats_set_rid(&cmd, endpoint->devid);
> + riscv_iommu_cmd_ats_set_pid(&cmd, domain->pasid);
> + riscv_iommu_cmd_ats_set_payload(&cmd, payload);
> + riscv_iommu_post(domain->iommu, &cmd);
> + }
> + riscv_iommu_iofence_sync(domain->iommu);
> }
>
> static void riscv_iommu_flush_iotlb_all(struct iommu_domain *iommu_domain)
> @@ -1310,6 +1881,7 @@ static int riscv_iommu_enable(struct riscv_iommu_device *iommu, unsigned request
> static const struct iommu_domain_ops riscv_iommu_domain_ops = {
> .free = riscv_iommu_domain_free,
> .attach_dev = riscv_iommu_attach_dev,
> + .set_dev_pasid = riscv_iommu_set_dev_pasid,
> .map_pages = riscv_iommu_map_pages,
> .unmap_pages = riscv_iommu_unmap_pages,
> .iova_to_phys = riscv_iommu_iova_to_phys,
> @@ -1326,9 +1898,13 @@ static const struct iommu_ops riscv_iommu_ops = {
> .probe_device = riscv_iommu_probe_device,
> .probe_finalize = riscv_iommu_probe_finalize,
> .release_device = riscv_iommu_release_device,
> + .remove_dev_pasid = riscv_iommu_remove_dev_pasid,
> .device_group = riscv_iommu_device_group,
> .get_resv_regions = riscv_iommu_get_resv_regions,
> .of_xlate = riscv_iommu_of_xlate,
> + .dev_enable_feat = riscv_iommu_dev_enable_feat,
> + .dev_disable_feat = riscv_iommu_dev_disable_feat,
> + .page_response = riscv_iommu_page_response,
> .default_domain_ops = &riscv_iommu_domain_ops,
> };
>
> @@ -1340,6 +1916,7 @@ void riscv_iommu_remove(struct riscv_iommu_device *iommu)
> riscv_iommu_queue_free(iommu, &iommu->cmdq);
> riscv_iommu_queue_free(iommu, &iommu->fltq);
> riscv_iommu_queue_free(iommu, &iommu->priq);
> + iopf_queue_free(iommu->pq_work);
> }
>
> int riscv_iommu_init(struct riscv_iommu_device *iommu)
> @@ -1362,6 +1939,12 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
> }
> #endif
>
> + if (iommu->cap & RISCV_IOMMU_CAP_PD20)
> + iommu->iommu.max_pasids = 1u << 20;
> + else if (iommu->cap & RISCV_IOMMU_CAP_PD17)
> + iommu->iommu.max_pasids = 1u << 17;
> + else if (iommu->cap & RISCV_IOMMU_CAP_PD8)
> + iommu->iommu.max_pasids = 1u << 8;
> /*
> * Assign queue lengths from module parameters if not already
> * set on the device tree.
> @@ -1387,6 +1970,13 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
> goto fail;
> if (!(iommu->cap & RISCV_IOMMU_CAP_ATS))
> goto no_ats;
> + /* PRI functionally depends on ATS’s capabilities. */
> + iommu->pq_work = iopf_queue_alloc(dev_name(dev));
> + if (!iommu->pq_work) {
> + dev_err(dev, "failed to allocate iopf queue\n");
> + ret = -ENOMEM;
> + goto fail;
> + }
>
> ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_PAGE_REQUEST_QUEUE);
> if (ret)
> @@ -1424,5 +2014,6 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
> riscv_iommu_queue_free(iommu, &iommu->priq);
> riscv_iommu_queue_free(iommu, &iommu->fltq);
> riscv_iommu_queue_free(iommu, &iommu->cmdq);
> + iopf_queue_free(iommu->pq_work);
> return ret;
> }
> diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> index fe32a4eff14e..83e8d00fd0f8 100644
> --- a/drivers/iommu/riscv/iommu.h
> +++ b/drivers/iommu/riscv/iommu.h
> @@ -17,9 +17,11 @@
> #include <linux/iova.h>
> #include <linux/io.h>
> #include <linux/idr.h>
> +#include <linux/mmu_notifier.h>
> #include <linux/list.h>
> #include <linux/iommu.h>
> #include <linux/io-pgtable.h>
> +#include <linux/mmu_notifier.h>

You include the mmu_notifier.h twice in this header

>
> #include "iommu-bits.h"
>
> @@ -76,6 +78,9 @@ struct riscv_iommu_device {
> unsigned ddt_mode;
> bool ddtp_in_iomem;
>
> + /* I/O page fault queue */
> + struct iopf_queue *pq_work;
> +
> /* hardware queues */
> struct riscv_iommu_queue cmdq;
> struct riscv_iommu_queue fltq;
> @@ -91,11 +96,14 @@ struct riscv_iommu_domain {
> struct io_pgtable pgtbl;
>
> struct list_head endpoints;
> + struct list_head notifiers;
> struct mutex lock;
> + struct mmu_notifier mn;
> struct riscv_iommu_device *iommu;
>
> unsigned mode; /* RIO_ATP_MODE_* enum */
> unsigned pscid; /* RISC-V IOMMU PSCID */
> + ioasid_t pasid; /* IOMMU_DOMAIN_SVA: Cached PASID */
>
> pgd_t *pgd_root; /* page table root pointer */
> };
> @@ -107,10 +115,16 @@ struct riscv_iommu_endpoint {
> unsigned domid; /* PCI domain number, segment */
> struct rb_node node; /* device tracking node (lookup by devid) */
> struct riscv_iommu_dc *dc; /* device context pointer */
> + struct riscv_iommu_pc *pc; /* process context root, valid if pasid_enabled is true */
> struct riscv_iommu_device *iommu; /* parent iommu device */
>
> struct mutex lock;
> struct list_head domain; /* endpoint attached managed domain */
> +
> + /* end point info bits */
> + unsigned pasid_bits;
> + unsigned pasid_feat;
> + bool pasid_enabled;
> };
>
> /* Helper functions and macros */
> --
> 2.34.1
>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-07-31 09:57:30

by Nick Kossifidis

[permalink] [raw]
Subject: Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues

On 7/29/23 15:58, Zong Li wrote:
> On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <[email protected]> wrote:
>> + iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
>> +
>> + /* For now we only support WSIs until we have AIA support */
>
> I'm not completely understand AIA support here, because I saw the pci
> case uses the MSI, and kernel seems to have the AIA implementation.
> Could you please elaborate it?
>

When I wrote this we didn't have AIA in the kernel, and without IMSIC we
can't have MSIs in the hart (we can still have MSIs in the PCIe controller).

>
> Should we define the "interrupt-names" in dt-bindings?
>

Yes we should, along with queue lengths below.

>> +
>> + /* Make sure fctl.WSI is set */
>> + fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL);
>> + fctl |= RISCV_IOMMU_FCTL_WSI;
>> + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, fctl);
>> +
>> + /* Parse Queue lengts */
>> + ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
>> + if (!ret)
>> + dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
>> +
>> + ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
>> + if (!ret)
>> + dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
>> +
>> + ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
>> + if (!ret)
>> + dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
>> +
>> dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
>>

2023-07-31 14:26:12

by Zong Li

[permalink] [raw]
Subject: Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues

On Mon, Jul 31, 2023 at 5:32 PM Nick Kossifidis <[email protected]> wrote:
>
> On 7/29/23 15:58, Zong Li wrote:
> > On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <[email protected]> wrote:
> >> + iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
> >> +
> >> + /* For now we only support WSIs until we have AIA support */
> >
> > I'm not completely understand AIA support here, because I saw the pci
> > case uses the MSI, and kernel seems to have the AIA implementation.
> > Could you please elaborate it?
> >
>
> When I wrote this we didn't have AIA in the kernel, and without IMSIC we
> can't have MSIs in the hart (we can still have MSIs in the PCIe controller).

Thanks for your clarification, do we support the MSI in next version?

>
> >
> > Should we define the "interrupt-names" in dt-bindings?
> >
>
> Yes we should, along with queue lengths below.
>
> >> +
> >> + /* Make sure fctl.WSI is set */
> >> + fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL);
> >> + fctl |= RISCV_IOMMU_FCTL_WSI;
> >> + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, fctl);
> >> +
> >> + /* Parse Queue lengts */
> >> + ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> >> + if (!ret)
> >> + dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> >> +
> >> + ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> >> + if (!ret)
> >> + dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> >> +
> >> + ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> >> + if (!ret)
> >> + dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
> >> +
> >> dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> >>

2023-07-31 23:51:01

by Nick Kossifidis

[permalink] [raw]
Subject: Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues

On 7/31/23 16:15, Zong Li wrote:
> On Mon, Jul 31, 2023 at 5:32 PM Nick Kossifidis <[email protected]> wrote:
>>
>> On 7/29/23 15:58, Zong Li wrote:
>>> On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <[email protected]> wrote:
>>>> + iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
>>>> +
>>>> + /* For now we only support WSIs until we have AIA support */
>>>
>>> I'm not completely understand AIA support here, because I saw the pci
>>> case uses the MSI, and kernel seems to have the AIA implementation.
>>> Could you please elaborate it?
>>>
>>
>> When I wrote this we didn't have AIA in the kernel, and without IMSIC we
>> can't have MSIs in the hart (we can still have MSIs in the PCIe controller).
>
> Thanks for your clarification, do we support the MSI in next version?
>

I don't think there is an IOMMU implementation out there (emulated or in
hw) that can do MSIs and is not a pcie device (the QEMU implementation
is a pcie device). If we have something to test this against, and we
also have an IMSIC etc, we can work on that.

2023-08-01 01:16:03

by Zong Li

[permalink] [raw]
Subject: Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues

On Tue, Aug 1, 2023 at 7:35 AM Nick Kossifidis <[email protected]> wrote:
>
> On 7/31/23 16:15, Zong Li wrote:
> > On Mon, Jul 31, 2023 at 5:32 PM Nick Kossifidis <[email protected]> wrote:
> >>
> >> On 7/29/23 15:58, Zong Li wrote:
> >>> On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <[email protected]> wrote:
> >>>> + iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
> >>>> +
> >>>> + /* For now we only support WSIs until we have AIA support */
> >>>
> >>> I'm not completely understand AIA support here, because I saw the pci
> >>> case uses the MSI, and kernel seems to have the AIA implementation.
> >>> Could you please elaborate it?
> >>>
> >>
> >> When I wrote this we didn't have AIA in the kernel, and without IMSIC we
> >> can't have MSIs in the hart (we can still have MSIs in the PCIe controller).
> >
> > Thanks for your clarification, do we support the MSI in next version?
> >
>
> I don't think there is an IOMMU implementation out there (emulated or in
> hw) that can do MSIs and is not a pcie device (the QEMU implementation
> is a pcie device). If we have something to test this against, and we
> also have an IMSIC etc, we can work on that.

I guess I can assist with that. We have an IOMMU hardware (non-pcie
device) that has already implemented the MSI functionality, and I have
conducted testing on it. Perhaps let me add the related implementation
here after this series is merged.

2023-08-02 21:11:06

by Tomasz Jeznach

[permalink] [raw]
Subject: Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues

On Sat, Jul 29, 2023 at 5:58 AM Zong Li <[email protected]> wrote:
>
> On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <[email protected]> wrote:
> >
> > Enables message or wire signal interrupts for PCIe and platforms devices.
> >
> > Co-developed-by: Nick Kossifidis <[email protected]>
> > Signed-off-by: Nick Kossifidis <[email protected]>
> > Signed-off-by: Tomasz Jeznach <[email protected]>
> > ---
> > drivers/iommu/riscv/iommu-pci.c | 72 ++++
> > drivers/iommu/riscv/iommu-platform.c | 66 +++
> > drivers/iommu/riscv/iommu.c | 604 ++++++++++++++++++++++++++-
> > drivers/iommu/riscv/iommu.h | 28 ++
> > 4 files changed, 769 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/riscv/iommu-pci.c b/drivers/iommu/riscv/iommu-pci.c
> > index c91f963d7a29..9ea0647f7b92 100644
> > --- a/drivers/iommu/riscv/iommu-pci.c
> > +++ b/drivers/iommu/riscv/iommu-pci.c
> > @@ -34,6 +34,7 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
> > {
> > struct device *dev = &pdev->dev;
> > struct riscv_iommu_device *iommu;
> > + u64 icvec;
> > int ret;
> >
> > ret = pci_enable_device_mem(pdev);
> > @@ -67,14 +68,84 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
> > iommu->dev = dev;
> > dev_set_drvdata(dev, iommu);
> >
> > + /* Check device reported capabilities. */
> > + iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
> > +
> > + /* The PCI driver only uses MSIs, make sure the IOMMU supports this */
> > + switch (FIELD_GET(RISCV_IOMMU_CAP_IGS, iommu->cap)) {
> > + case RISCV_IOMMU_CAP_IGS_MSI:
> > + case RISCV_IOMMU_CAP_IGS_BOTH:
> > + break;
> > + default:
> > + dev_err(dev, "unable to use message-signaled interrupts\n");
> > + ret = -ENODEV;
> > + goto fail;
> > + }
> > +
> > dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
> > pci_set_master(pdev);
> >
> > + /* Allocate and assign IRQ vectors for the various events */
> > + ret = pci_alloc_irq_vectors(pdev, 1, RISCV_IOMMU_INTR_COUNT, PCI_IRQ_MSIX);
> > + if (ret < 0) {
> > + dev_err(dev, "unable to allocate irq vectors\n");
> > + goto fail;
> > + }
> > +
> > + ret = -ENODEV;
> > +
> > + iommu->irq_cmdq = msi_get_virq(dev, RISCV_IOMMU_INTR_CQ);
> > + if (!iommu->irq_cmdq) {
> > + dev_warn(dev, "no MSI vector %d for the command queue\n",
> > + RISCV_IOMMU_INTR_CQ);
> > + goto fail;
> > + }
> > +
> > + iommu->irq_fltq = msi_get_virq(dev, RISCV_IOMMU_INTR_FQ);
> > + if (!iommu->irq_fltq) {
> > + dev_warn(dev, "no MSI vector %d for the fault/event queue\n",
> > + RISCV_IOMMU_INTR_FQ);
> > + goto fail;
> > + }
> > +
> > + if (iommu->cap & RISCV_IOMMU_CAP_HPM) {
> > + iommu->irq_pm = msi_get_virq(dev, RISCV_IOMMU_INTR_PM);
> > + if (!iommu->irq_pm) {
> > + dev_warn(dev,
> > + "no MSI vector %d for performance monitoring\n",
> > + RISCV_IOMMU_INTR_PM);
> > + goto fail;
> > + }
> > + }
> > +
> > + if (iommu->cap & RISCV_IOMMU_CAP_ATS) {
> > + iommu->irq_priq = msi_get_virq(dev, RISCV_IOMMU_INTR_PQ);
> > + if (!iommu->irq_priq) {
> > + dev_warn(dev,
> > + "no MSI vector %d for page-request queue\n",
> > + RISCV_IOMMU_INTR_PQ);
> > + goto fail;
> > + }
> > + }
> > +
> > + /* Set simple 1:1 mapping for MSI vectors */
> > + icvec = FIELD_PREP(RISCV_IOMMU_IVEC_CIV, RISCV_IOMMU_INTR_CQ) |
> > + FIELD_PREP(RISCV_IOMMU_IVEC_FIV, RISCV_IOMMU_INTR_FQ);
> > +
> > + if (iommu->cap & RISCV_IOMMU_CAP_HPM)
> > + icvec |= FIELD_PREP(RISCV_IOMMU_IVEC_PMIV, RISCV_IOMMU_INTR_PM);
> > +
> > + if (iommu->cap & RISCV_IOMMU_CAP_ATS)
> > + icvec |= FIELD_PREP(RISCV_IOMMU_IVEC_PIV, RISCV_IOMMU_INTR_PQ);
> > +
> > + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IVEC, icvec);
> > +
> > ret = riscv_iommu_init(iommu);
> > if (!ret)
> > return ret;
> >
> > fail:
> > + pci_free_irq_vectors(pdev);
> > pci_clear_master(pdev);
> > pci_release_regions(pdev);
> > pci_disable_device(pdev);
> > @@ -85,6 +156,7 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
> > static void riscv_iommu_pci_remove(struct pci_dev *pdev)
> > {
> > riscv_iommu_remove(dev_get_drvdata(&pdev->dev));
> > + pci_free_irq_vectors(pdev);
> > pci_clear_master(pdev);
> > pci_release_regions(pdev);
> > pci_disable_device(pdev);
> > diff --git a/drivers/iommu/riscv/iommu-platform.c b/drivers/iommu/riscv/iommu-platform.c
> > index e4e8ca6711e7..35935d3c7ef4 100644
> > --- a/drivers/iommu/riscv/iommu-platform.c
> > +++ b/drivers/iommu/riscv/iommu-platform.c
> > @@ -20,6 +20,8 @@ static int riscv_iommu_platform_probe(struct platform_device *pdev)
> > struct device *dev = &pdev->dev;
> > struct riscv_iommu_device *iommu = NULL;
> > struct resource *res = NULL;
> > + u32 fctl = 0;
> > + int irq = 0;
> > int ret = 0;
> >
> > iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
> > @@ -53,6 +55,70 @@ static int riscv_iommu_platform_probe(struct platform_device *pdev)
> > goto fail;
> > }
> >
> > + iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
> > +
> > + /* For now we only support WSIs until we have AIA support */
>
> I'm not completely understand AIA support here, because I saw the pci
> case uses the MSI, and kernel seems to have the AIA implementation.
> Could you please elaborate it?
>
> > + ret = FIELD_GET(RISCV_IOMMU_CAP_IGS, iommu->cap);
> > + if (ret == RISCV_IOMMU_CAP_IGS_MSI) {
> > + dev_err(dev, "IOMMU only supports MSIs\n");
> > + goto fail;
> > + }
> > +
> > + /* Parse IRQ assignment */
> > + irq = platform_get_irq_byname_optional(pdev, "cmdq");
> > + if (irq > 0)
> > + iommu->irq_cmdq = irq;
> > + else {
> > + dev_err(dev, "no IRQ provided for the command queue\n");
> > + goto fail;
> > + }
> > +
> > + irq = platform_get_irq_byname_optional(pdev, "fltq");
> > + if (irq > 0)
> > + iommu->irq_fltq = irq;
> > + else {
> > + dev_err(dev, "no IRQ provided for the fault/event queue\n");
> > + goto fail;
> > + }
> > +
> > + if (iommu->cap & RISCV_IOMMU_CAP_HPM) {
> > + irq = platform_get_irq_byname_optional(pdev, "pm");
> > + if (irq > 0)
> > + iommu->irq_pm = irq;
> > + else {
> > + dev_err(dev, "no IRQ provided for performance monitoring\n");
> > + goto fail;
> > + }
> > + }
> > +
> > + if (iommu->cap & RISCV_IOMMU_CAP_ATS) {
> > + irq = platform_get_irq_byname_optional(pdev, "priq");
> > + if (irq > 0)
> > + iommu->irq_priq = irq;
> > + else {
> > + dev_err(dev, "no IRQ provided for the page-request queue\n");
> > + goto fail;
> > + }
> > + }
>
> Should we define the "interrupt-names" in dt-bindings?
>

Yes, this was brought up earlier wrt dt-bindings.

I'm considering removal of interrupt names from DT (and get-byname
option), as IOMMU hardware cause-to-vector remapping `icvec` should be
used to map interrupt source to actual interrupt vector. If possible
device driver should map cause to interrupt (based on number of
vectors available) or rely on ICVEC WARL properties to discover fixed
cause-to-vector mapping in the hardware.

Please let me know if this is reasonable change.

> > +
> > + /* Make sure fctl.WSI is set */
> > + fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL);
> > + fctl |= RISCV_IOMMU_FCTL_WSI;
> > + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, fctl);
> > +
> > + /* Parse Queue lengts */
> > + ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> > + if (!ret)
> > + dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> > +
> > + ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> > + if (!ret)
> > + dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> > +
> > + ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> > + if (!ret)
> > + dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
> > +
> > dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> >
> > return riscv_iommu_init(iommu);
> > diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> > index 31dc3c458e13..5c4cf9875302 100644
> > --- a/drivers/iommu/riscv/iommu.c
> > +++ b/drivers/iommu/riscv/iommu.c
> > @@ -45,6 +45,18 @@ static int ddt_mode = RISCV_IOMMU_DDTP_MODE_BARE;
> > module_param(ddt_mode, int, 0644);
> > MODULE_PARM_DESC(ddt_mode, "Device Directory Table mode.");
> >
> > +static int cmdq_length = 1024;
> > +module_param(cmdq_length, int, 0644);
> > +MODULE_PARM_DESC(cmdq_length, "Command queue length.");
> > +
> > +static int fltq_length = 1024;
> > +module_param(fltq_length, int, 0644);
> > +MODULE_PARM_DESC(fltq_length, "Fault queue length.");
> > +
> > +static int priq_length = 1024;
> > +module_param(priq_length, int, 0644);
> > +MODULE_PARM_DESC(priq_length, "Page request interface queue length.");
> > +
> > /* IOMMU PSCID allocation namespace. */
> > #define RISCV_IOMMU_MAX_PSCID (1U << 20)
> > static DEFINE_IDA(riscv_iommu_pscids);
> > @@ -65,6 +77,497 @@ static DEFINE_IDA(riscv_iommu_pscids);
> > static const struct iommu_domain_ops riscv_iommu_domain_ops;
> > static const struct iommu_ops riscv_iommu_ops;
> >
> > +/*
> > + * Common queue management routines
> > + */
> > +
> > +/* Note: offsets are the same for all queues */
> > +#define Q_HEAD(q) ((q)->qbr + (RISCV_IOMMU_REG_CQH - RISCV_IOMMU_REG_CQB))
> > +#define Q_TAIL(q) ((q)->qbr + (RISCV_IOMMU_REG_CQT - RISCV_IOMMU_REG_CQB))
> > +
> > +static unsigned riscv_iommu_queue_consume(struct riscv_iommu_device *iommu,
> > + struct riscv_iommu_queue *q, unsigned *ready)
> > +{
> > + u32 tail = riscv_iommu_readl(iommu, Q_TAIL(q));
> > + *ready = q->lui;
> > +
> > + BUG_ON(q->cnt <= tail);
> > + if (q->lui <= tail)
> > + return tail - q->lui;
> > + return q->cnt - q->lui;
> > +}
> > +
> > +static void riscv_iommu_queue_release(struct riscv_iommu_device *iommu,
> > + struct riscv_iommu_queue *q, unsigned count)
> > +{
> > + q->lui = (q->lui + count) & (q->cnt - 1);
> > + riscv_iommu_writel(iommu, Q_HEAD(q), q->lui);
> > +}
> > +
> > +static u32 riscv_iommu_queue_ctrl(struct riscv_iommu_device *iommu,
> > + struct riscv_iommu_queue *q, u32 val)
> > +{
> > + cycles_t end_cycles = RISCV_IOMMU_TIMEOUT + get_cycles();
> > +
> > + riscv_iommu_writel(iommu, q->qcr, val);
> > + do {
> > + val = riscv_iommu_readl(iommu, q->qcr);
> > + if (!(val & RISCV_IOMMU_QUEUE_BUSY))
> > + break;
> > + cpu_relax();
> > + } while (get_cycles() < end_cycles);
> > +
> > + return val;
> > +}
> > +
> > +static void riscv_iommu_queue_free(struct riscv_iommu_device *iommu,
> > + struct riscv_iommu_queue *q)
> > +{
> > + size_t size = q->len * q->cnt;
> > +
> > + riscv_iommu_queue_ctrl(iommu, q, 0);
> > +
> > + if (q->base) {
> > + if (q->in_iomem)
> > + iounmap(q->base);
> > + else
> > + dmam_free_coherent(iommu->dev, size, q->base, q->base_dma);
> > + }
> > + if (q->irq)
> > + free_irq(q->irq, q);
> > +}
> > +
> > +static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data);
> > +static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data);
> > +static irqreturn_t riscv_iommu_fltq_irq_check(int irq, void *data);
> > +static irqreturn_t riscv_iommu_fltq_process(int irq, void *data);
> > +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data);
> > +static irqreturn_t riscv_iommu_priq_process(int irq, void *data);
> > +
> > +static int riscv_iommu_queue_init(struct riscv_iommu_device *iommu, int queue_id)
> > +{
> > + struct device *dev = iommu->dev;
> > + struct riscv_iommu_queue *q = NULL;
> > + size_t queue_size = 0;
> > + irq_handler_t irq_check;
> > + irq_handler_t irq_process;
> > + const char *name;
> > + int count = 0;
> > + int irq = 0;
> > + unsigned order = 0;
> > + u64 qbr_val = 0;
> > + u64 qbr_readback = 0;
> > + u64 qbr_paddr = 0;
> > + int ret = 0;
> > +
> > + switch (queue_id) {
> > + case RISCV_IOMMU_COMMAND_QUEUE:
> > + q = &iommu->cmdq;
> > + q->len = sizeof(struct riscv_iommu_command);
> > + count = iommu->cmdq_len;
> > + irq = iommu->irq_cmdq;
> > + irq_check = riscv_iommu_cmdq_irq_check;
> > + irq_process = riscv_iommu_cmdq_process;
> > + q->qbr = RISCV_IOMMU_REG_CQB;
> > + q->qcr = RISCV_IOMMU_REG_CQCSR;
> > + name = "cmdq";
> > + break;
> > + case RISCV_IOMMU_FAULT_QUEUE:
> > + q = &iommu->fltq;
> > + q->len = sizeof(struct riscv_iommu_fq_record);
> > + count = iommu->fltq_len;
> > + irq = iommu->irq_fltq;
> > + irq_check = riscv_iommu_fltq_irq_check;
> > + irq_process = riscv_iommu_fltq_process;
> > + q->qbr = RISCV_IOMMU_REG_FQB;
> > + q->qcr = RISCV_IOMMU_REG_FQCSR;
> > + name = "fltq";
> > + break;
> > + case RISCV_IOMMU_PAGE_REQUEST_QUEUE:
> > + q = &iommu->priq;
> > + q->len = sizeof(struct riscv_iommu_pq_record);
> > + count = iommu->priq_len;
> > + irq = iommu->irq_priq;
> > + irq_check = riscv_iommu_priq_irq_check;
> > + irq_process = riscv_iommu_priq_process;
> > + q->qbr = RISCV_IOMMU_REG_PQB;
> > + q->qcr = RISCV_IOMMU_REG_PQCSR;
> > + name = "priq";
> > + break;
> > + default:
> > + dev_err(dev, "invalid queue interrupt index in queue_init!\n");
> > + return -EINVAL;
> > + }
> > +
> > + /* Polling not implemented */
> > + if (!irq)
> > + return -ENODEV;
> > +
> > + /* Allocate queue in memory and set the base register */
> > + order = ilog2(count);
> > + do {
> > + queue_size = q->len * (1ULL << order);
> > + q->base = dmam_alloc_coherent(dev, queue_size, &q->base_dma, GFP_KERNEL);
> > + if (q->base || queue_size < PAGE_SIZE)
> > + break;
> > +
> > + order--;
> > + } while (1);
> > +
> > + if (!q->base) {
> > + dev_err(dev, "failed to allocate %s queue (cnt: %u)\n", name, count);
> > + return -ENOMEM;
> > + }
> > +
> > + q->cnt = 1ULL << order;
> > +
> > + qbr_val = phys_to_ppn(q->base_dma) |
> > + FIELD_PREP(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, order - 1);
> > +
> > + riscv_iommu_writeq(iommu, q->qbr, qbr_val);
> > +
> > + /*
> > + * Queue base registers are WARL, so it's possible that whatever we wrote
> > + * there was illegal/not supported by the hw in which case we need to make
> > + * sure we set a supported PPN and/or queue size.
> > + */
> > + qbr_readback = riscv_iommu_readq(iommu, q->qbr);
> > + if (qbr_readback == qbr_val)
> > + goto irq;
> > +
> > + dmam_free_coherent(dev, queue_size, q->base, q->base_dma);
> > +
> > + /* Get supported queue size */
> > + order = FIELD_GET(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, qbr_readback) + 1;
> > + q->cnt = 1ULL << order;
> > + queue_size = q->len * q->cnt;
> > +
> > + /*
> > + * In case we also failed to set PPN, it means the field is hardcoded and the
> > + * queue resides in I/O memory instead, so get its physical address and
> > + * ioremap it.
> > + */
> > + qbr_paddr = ppn_to_phys(qbr_readback);
> > + if (qbr_paddr != q->base_dma) {
> > + dev_info(dev,
> > + "hardcoded ppn in %s base register, using io memory for the queue\n",
> > + name);
> > + dev_info(dev, "queue length for %s set to %i\n", name, q->cnt);
> > + q->in_iomem = true;
> > + q->base = ioremap(qbr_paddr, queue_size);
> > + if (!q->base) {
> > + dev_err(dev, "failed to map %s queue (cnt: %u)\n", name, q->cnt);
> > + return -ENOMEM;
> > + }
> > + q->base_dma = qbr_paddr;
> > + } else {
> > + /*
> > + * We only failed to set the queue size, re-try to allocate memory with
> > + * the queue size supported by the hw.
> > + */
> > + dev_info(dev, "hardcoded queue size in %s base register\n", name);
> > + dev_info(dev, "retrying with queue length: %i\n", q->cnt);
> > + q->base = dmam_alloc_coherent(dev, queue_size, &q->base_dma, GFP_KERNEL);
> > + if (!q->base) {
> > + dev_err(dev, "failed to allocate %s queue (cnt: %u)\n",
> > + name, q->cnt);
> > + return -ENOMEM;
> > + }
> > + }
> > +
> > + qbr_val = phys_to_ppn(q->base_dma) |
> > + FIELD_PREP(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, order - 1);
> > + riscv_iommu_writeq(iommu, q->qbr, qbr_val);
> > +
> > + /* Final check to make sure hw accepted our write */
> > + qbr_readback = riscv_iommu_readq(iommu, q->qbr);
> > + if (qbr_readback != qbr_val) {
> > + dev_err(dev, "failed to set base register for %s\n", name);
> > + goto fail;
> > + }
> > +
> > + irq:
> > + if (request_threaded_irq(irq, irq_check, irq_process, IRQF_ONESHOT | IRQF_SHARED,
> > + dev_name(dev), q)) {
> > + dev_err(dev, "fail to request irq %d for %s\n", irq, name);
> > + goto fail;
> > + }
> > +
> > + q->irq = irq;
> > +
> > + /* Note: All RIO_xQ_EN/IE fields are in the same offsets */
> > + ret =
> > + riscv_iommu_queue_ctrl(iommu, q,
> > + RISCV_IOMMU_QUEUE_ENABLE |
> > + RISCV_IOMMU_QUEUE_INTR_ENABLE);
> > + if (ret & RISCV_IOMMU_QUEUE_BUSY) {
> > + dev_err(dev, "%s init timeout\n", name);
> > + ret = -EBUSY;
> > + goto fail;
> > + }
> > +
> > + return 0;
> > +
> > + fail:
> > + riscv_iommu_queue_free(iommu, q);
> > + return 0;
> > +}
> > +
> > +/*
> > + * I/O MMU Command queue chapter 3.1
> > + */
> > +
> > +static inline void riscv_iommu_cmd_inval_vma(struct riscv_iommu_command *cmd)
> > +{
> > + cmd->dword0 =
> > + FIELD_PREP(RISCV_IOMMU_CMD_OPCODE,
> > + RISCV_IOMMU_CMD_IOTINVAL_OPCODE) | FIELD_PREP(RISCV_IOMMU_CMD_FUNC,
> > + RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA);
> > + cmd->dword1 = 0;
> > +}
> > +
> > +static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
> > + u64 addr)
> > +{
> > + cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
> > + cmd->dword1 = addr;
> > +}
> > +
> > +static inline void riscv_iommu_cmd_inval_set_pscid(struct riscv_iommu_command *cmd,
> > + unsigned pscid)
> > +{
> > + cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_PSCID, pscid) |
> > + RISCV_IOMMU_CMD_IOTINVAL_PSCV;
> > +}
> > +
> > +static inline void riscv_iommu_cmd_inval_set_gscid(struct riscv_iommu_command *cmd,
> > + unsigned gscid)
> > +{
> > + cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_GSCID, gscid) |
> > + RISCV_IOMMU_CMD_IOTINVAL_GV;
> > +}
> > +
> > +static inline void riscv_iommu_cmd_iofence(struct riscv_iommu_command *cmd)
> > +{
> > + cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOFENCE_OPCODE) |
> > + FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOFENCE_FUNC_C);
> > + cmd->dword1 = 0;
> > +}
> > +
> > +static inline void riscv_iommu_cmd_iofence_set_av(struct riscv_iommu_command *cmd,
> > + u64 addr, u32 data)
> > +{
> > + cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOFENCE_OPCODE) |
> > + FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOFENCE_FUNC_C) |
> > + FIELD_PREP(RISCV_IOMMU_CMD_IOFENCE_DATA, data) | RISCV_IOMMU_CMD_IOFENCE_AV;
> > + cmd->dword1 = (addr >> 2);
> > +}
> > +
> > +static inline void riscv_iommu_cmd_iodir_inval_ddt(struct riscv_iommu_command *cmd)
> > +{
> > + cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IODIR_OPCODE) |
> > + FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT);
> > + cmd->dword1 = 0;
> > +}
> > +
> > +static inline void riscv_iommu_cmd_iodir_inval_pdt(struct riscv_iommu_command *cmd)
> > +{
> > + cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IODIR_OPCODE) |
> > + FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT);
> > + cmd->dword1 = 0;
> > +}
> > +
> > +static inline void riscv_iommu_cmd_iodir_set_did(struct riscv_iommu_command *cmd,
> > + unsigned devid)
> > +{
> > + cmd->dword0 |=
> > + FIELD_PREP(RISCV_IOMMU_CMD_IODIR_DID, devid) | RISCV_IOMMU_CMD_IODIR_DV;
> > +}
> > +
> > +/* TODO: Convert into lock-less MPSC implementation. */
> > +static bool riscv_iommu_post_sync(struct riscv_iommu_device *iommu,
> > + struct riscv_iommu_command *cmd, bool sync)
> > +{
> > + u32 head, tail, next, last;
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&iommu->cq_lock, flags);
> > + head = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQH) & (iommu->cmdq.cnt - 1);
> > + tail = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQT) & (iommu->cmdq.cnt - 1);
> > + last = iommu->cmdq.lui;
> > + if (tail != last) {
> > + spin_unlock_irqrestore(&iommu->cq_lock, flags);
> > + /*
> > + * FIXME: This is a workaround for dropped MMIO writes/reads on QEMU platform.
> > + * While debugging of the problem is still ongoing, this provides
> > + * a simple impolementation of try-again policy.
> > + * Will be changed to lock-less algorithm in the feature.
> > + */
> > + dev_dbg(iommu->dev, "IOMMU CQT: %x != %x (1st)\n", last, tail);
> > + spin_lock_irqsave(&iommu->cq_lock, flags);
> > + tail =
> > + riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQT) & (iommu->cmdq.cnt - 1);
> > + last = iommu->cmdq.lui;
> > + if (tail != last) {
> > + spin_unlock_irqrestore(&iommu->cq_lock, flags);
> > + dev_dbg(iommu->dev, "IOMMU CQT: %x != %x (2nd)\n", last, tail);
> > + spin_lock_irqsave(&iommu->cq_lock, flags);
> > + }
> > + }
> > +
> > + next = (last + 1) & (iommu->cmdq.cnt - 1);
> > + if (next != head) {
> > + struct riscv_iommu_command *ptr = iommu->cmdq.base;
> > + ptr[last] = *cmd;
> > + wmb();
> > + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_CQT, next);
> > + iommu->cmdq.lui = next;
> > + }
> > +
> > + spin_unlock_irqrestore(&iommu->cq_lock, flags);
> > +
> > + if (sync && head != next) {
> > + cycles_t start_time = get_cycles();
> > + while (1) {
> > + last = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQH) &
> > + (iommu->cmdq.cnt - 1);
> > + if (head < next && last >= next)
> > + break;
> > + if (head > next && last < head && last >= next)
> > + break;
> > + if (RISCV_IOMMU_TIMEOUT < (get_cycles() - start_time)) {
>
> This condition will be imprecise, because here is not in irq disabled
> context, it will be scheduled out or preempted. When we come back
> here, it might be over 1 second, but the IOFENCE is actually
> completed.
>

Good point. Thank.


> > + dev_err(iommu->dev, "IOFENCE TIMEOUT\n");
> > + return false;
> > + }
> > + cpu_relax();
> > + }
> > + }
> > +
> > + return next != head;
> > +}
> > +
> > +static bool riscv_iommu_post(struct riscv_iommu_device *iommu,
> > + struct riscv_iommu_command *cmd)
> > +{
> > + return riscv_iommu_post_sync(iommu, cmd, false);
> > +}
> > +
> > +static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
> > +{
> > + struct riscv_iommu_command cmd;
> > + riscv_iommu_cmd_iofence(&cmd);
> > + return riscv_iommu_post_sync(iommu, &cmd, true);
> > +}
> > +
> > +/* Command queue primary interrupt handler */
> > +static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data)
> > +{
> > + struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > + struct riscv_iommu_device *iommu =
> > + container_of(q, struct riscv_iommu_device, cmdq);
> > + u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> > + if (ipsr & RISCV_IOMMU_IPSR_CIP)
> > + return IRQ_WAKE_THREAD;
> > + return IRQ_NONE;
> > +}
> > +
> > +/* Command queue interrupt hanlder thread function */
> > +static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data)
> > +{
> > + struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > + struct riscv_iommu_device *iommu;
> > + unsigned ctrl;
> > +
> > + iommu = container_of(q, struct riscv_iommu_device, cmdq);
> > +
> > + /* Error reporting, clear error reports if any. */
> > + ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQCSR);
> > + if (ctrl & (RISCV_IOMMU_CQCSR_CQMF |
> > + RISCV_IOMMU_CQCSR_CMD_TO | RISCV_IOMMU_CQCSR_CMD_ILL)) {
> > + riscv_iommu_queue_ctrl(iommu, &iommu->cmdq, ctrl);
> > + dev_warn_ratelimited(iommu->dev,
> > + "Command queue error: fault: %d tout: %d err: %d\n",
> > + !!(ctrl & RISCV_IOMMU_CQCSR_CQMF),
> > + !!(ctrl & RISCV_IOMMU_CQCSR_CMD_TO),
> > + !!(ctrl & RISCV_IOMMU_CQCSR_CMD_ILL));
>
> We need to handle the error by either adjusting the tail to remove the
> failed command or fixing the failed command itself. Otherwise, the
> failed command will keep in the queue and IOMMU will try to execute
> it. I guess the first option might be easier to implement.
>

Correct. Thanks for pointing this out.
Error handling / recovery was not pushed in this series. There is
work-in-progress series to handle various types of failures, including
command processing errors, DDT misconfiguration, queue overflows,
device reported faults handling, etc. I can bring some of the error
handling here, if needed. Otherwise I'd prefer to keep it as separate
series, sent out once this one is merged.

> > + }
> > +
> > + /* Clear fault interrupt pending. */
> > + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_CIP);
> > +
> > + return IRQ_HANDLED;
> > +}
> > +
> > +/*
> > + * Fault/event queue, chapter 3.2
> > + */
> > +
> > +static void riscv_iommu_fault_report(struct riscv_iommu_device *iommu,
> > + struct riscv_iommu_fq_record *event)
> > +{
> > + unsigned err, devid;
> > +
> > + err = FIELD_GET(RISCV_IOMMU_FQ_HDR_CAUSE, event->hdr);
> > + devid = FIELD_GET(RISCV_IOMMU_FQ_HDR_DID, event->hdr);
> > +
> > + dev_warn_ratelimited(iommu->dev,
> > + "Fault %d devid: %d" " iotval: %llx iotval2: %llx\n", err,
> > + devid, event->iotval, event->iotval2);
> > +}
> > +
> > +/* Fault/event queue primary interrupt handler */
> > +static irqreturn_t riscv_iommu_fltq_irq_check(int irq, void *data)
> > +{
> > + struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > + struct riscv_iommu_device *iommu =
> > + container_of(q, struct riscv_iommu_device, fltq);
> > + u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> > + if (ipsr & RISCV_IOMMU_IPSR_FIP)
> > + return IRQ_WAKE_THREAD;
> > + return IRQ_NONE;
> > +}
> > +
> > +/* Fault queue interrupt hanlder thread function */
> > +static irqreturn_t riscv_iommu_fltq_process(int irq, void *data)
> > +{
> > + struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > + struct riscv_iommu_device *iommu;
> > + struct riscv_iommu_fq_record *events;
> > + unsigned cnt, len, idx, ctrl;
> > +
> > + iommu = container_of(q, struct riscv_iommu_device, fltq);
> > + events = (struct riscv_iommu_fq_record *)q->base;
> > +
> > + /* Error reporting, clear error reports if any. */
> > + ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FQCSR);
> > + if (ctrl & (RISCV_IOMMU_FQCSR_FQMF | RISCV_IOMMU_FQCSR_FQOF)) {
> > + riscv_iommu_queue_ctrl(iommu, &iommu->fltq, ctrl);
> > + dev_warn_ratelimited(iommu->dev,
> > + "Fault queue error: fault: %d full: %d\n",
> > + !!(ctrl & RISCV_IOMMU_FQCSR_FQMF),
> > + !!(ctrl & RISCV_IOMMU_FQCSR_FQOF));
> > + }
> > +
> > + /* Clear fault interrupt pending. */
> > + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_FIP);
> > +
> > + /* Report fault events. */
> > + do {
> > + cnt = riscv_iommu_queue_consume(iommu, q, &idx);
> > + if (!cnt)
> > + break;
> > + for (len = 0; len < cnt; idx++, len++)
> > + riscv_iommu_fault_report(iommu, &events[idx]);
> > + riscv_iommu_queue_release(iommu, q, cnt);
> > + } while (1);
> > +
> > + return IRQ_HANDLED;
> > +}
> > +
> > +/*
> > + * Page request queue, chapter 3.3
> > + */
> > +
> > /*
> > * Register device for IOMMU tracking.
> > */
> > @@ -97,6 +600,54 @@ static void riscv_iommu_add_device(struct riscv_iommu_device *iommu, struct devi
> > mutex_unlock(&iommu->eps_mutex);
> > }
> >
> > +/* Page request interface queue primary interrupt handler */
> > +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data)
> > +{
> > + struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > + struct riscv_iommu_device *iommu =
> > + container_of(q, struct riscv_iommu_device, priq);
> > + u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> > + if (ipsr & RISCV_IOMMU_IPSR_PIP)
> > + return IRQ_WAKE_THREAD;
> > + return IRQ_NONE;
> > +}
> > +
> > +/* Page request interface queue interrupt hanlder thread function */
> > +static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
> > +{
> > + struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > + struct riscv_iommu_device *iommu;
> > + struct riscv_iommu_pq_record *requests;
> > + unsigned cnt, idx, ctrl;
> > +
> > + iommu = container_of(q, struct riscv_iommu_device, priq);
> > + requests = (struct riscv_iommu_pq_record *)q->base;
> > +
> > + /* Error reporting, clear error reports if any. */
> > + ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_PQCSR);
> > + if (ctrl & (RISCV_IOMMU_PQCSR_PQMF | RISCV_IOMMU_PQCSR_PQOF)) {
> > + riscv_iommu_queue_ctrl(iommu, &iommu->priq, ctrl);
> > + dev_warn_ratelimited(iommu->dev,
> > + "Page request queue error: fault: %d full: %d\n",
> > + !!(ctrl & RISCV_IOMMU_PQCSR_PQMF),
> > + !!(ctrl & RISCV_IOMMU_PQCSR_PQOF));
> > + }
> > +
> > + /* Clear page request interrupt pending. */
> > + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_PIP);
> > +
> > + /* Process page requests. */
> > + do {
> > + cnt = riscv_iommu_queue_consume(iommu, q, &idx);
> > + if (!cnt)
> > + break;
> > + dev_warn(iommu->dev, "unexpected %u page requests\n", cnt);
> > + riscv_iommu_queue_release(iommu, q, cnt);
> > + } while (1);
> > +
> > + return IRQ_HANDLED;
> > +}
> > +
> > /*
> > * Endpoint management
> > */
> > @@ -350,7 +901,29 @@ static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
> > unsigned long *start, unsigned long *end,
> > size_t *pgsize)
> > {
> > - /* Command interface not implemented */
> > + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> > + struct riscv_iommu_command cmd;
> > + unsigned long iova;
> > +
> > + if (domain->mode == RISCV_IOMMU_DC_FSC_MODE_BARE)
> > + return;
> > +
> > + /* Domain not attached to an IOMMU! */
> > + BUG_ON(!domain->iommu);
> > +
> > + riscv_iommu_cmd_inval_vma(&cmd);
> > + riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
> > +
> > + if (start && end && pgsize) {
> > + /* Cover only the range that is needed */
> > + for (iova = *start; iova <= *end; iova += *pgsize) {
> > + riscv_iommu_cmd_inval_set_addr(&cmd, iova);
> > + riscv_iommu_post(domain->iommu, &cmd);
> > + }
> > + } else {
> > + riscv_iommu_post(domain->iommu, &cmd);
> > + }
> > + riscv_iommu_iofence_sync(domain->iommu);
> > }
> >
> > static void riscv_iommu_flush_iotlb_all(struct iommu_domain *iommu_domain)
> > @@ -610,6 +1183,9 @@ void riscv_iommu_remove(struct riscv_iommu_device *iommu)
> > iommu_device_unregister(&iommu->iommu);
> > iommu_device_sysfs_remove(&iommu->iommu);
> > riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
> > + riscv_iommu_queue_free(iommu, &iommu->cmdq);
> > + riscv_iommu_queue_free(iommu, &iommu->fltq);
> > + riscv_iommu_queue_free(iommu, &iommu->priq);
> > }
> >
> > int riscv_iommu_init(struct riscv_iommu_device *iommu)
> > @@ -632,6 +1208,16 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
> > }
> > #endif
> >
> > + /*
> > + * Assign queue lengths from module parameters if not already
> > + * set on the device tree.
> > + */
> > + if (!iommu->cmdq_len)
> > + iommu->cmdq_len = cmdq_length;
> > + if (!iommu->fltq_len)
> > + iommu->fltq_len = fltq_length;
> > + if (!iommu->priq_len)
> > + iommu->priq_len = priq_length;
> > /* Clear any pending interrupt flag. */
> > riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR,
> > RISCV_IOMMU_IPSR_CIP |
> > @@ -639,7 +1225,20 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
> > RISCV_IOMMU_IPSR_PMIP | RISCV_IOMMU_IPSR_PIP);
> > spin_lock_init(&iommu->cq_lock);
> > mutex_init(&iommu->eps_mutex);
> > + ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_COMMAND_QUEUE);
> > + if (ret)
> > + goto fail;
> > + ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_FAULT_QUEUE);
> > + if (ret)
> > + goto fail;
> > + if (!(iommu->cap & RISCV_IOMMU_CAP_ATS))
> > + goto no_ats;
> > +
> > + ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_PAGE_REQUEST_QUEUE);
> > + if (ret)
> > + goto fail;
> >
> > + no_ats:
> > ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);
> >
> > if (ret) {
> > @@ -663,5 +1262,8 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
> > return 0;
> > fail:
> > riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
> > + riscv_iommu_queue_free(iommu, &iommu->priq);
> > + riscv_iommu_queue_free(iommu, &iommu->fltq);
> > + riscv_iommu_queue_free(iommu, &iommu->cmdq);
> > return ret;
> > }
> > diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> > index 7dc9baa59a50..04148a2a8ffd 100644
> > --- a/drivers/iommu/riscv/iommu.h
> > +++ b/drivers/iommu/riscv/iommu.h
> > @@ -28,6 +28,24 @@
> > #define IOMMU_PAGE_SIZE_1G BIT_ULL(30)
> > #define IOMMU_PAGE_SIZE_512G BIT_ULL(39)
> >
> > +struct riscv_iommu_queue {
> > + dma_addr_t base_dma; /* ring buffer bus address */
> > + void *base; /* ring buffer pointer */
> > + size_t len; /* single item length */
> > + u32 cnt; /* items count */
> > + u32 lui; /* last used index, consumer/producer share */
> > + unsigned qbr; /* queue base register offset */
> > + unsigned qcr; /* queue control and status register offset */
> > + int irq; /* registered interrupt number */
> > + bool in_iomem; /* indicates queue data are in I/O memory */
> > +};
> > +
> > +enum riscv_queue_ids {
> > + RISCV_IOMMU_COMMAND_QUEUE = 0,
> > + RISCV_IOMMU_FAULT_QUEUE = 1,
> > + RISCV_IOMMU_PAGE_REQUEST_QUEUE = 2
> > +};
> > +
> > struct riscv_iommu_device {
> > struct iommu_device iommu; /* iommu core interface */
> > struct device *dev; /* iommu hardware */
> > @@ -42,6 +60,11 @@ struct riscv_iommu_device {
> > int irq_pm;
> > int irq_priq;
> >
> > + /* Queue lengths */
> > + int cmdq_len;
> > + int fltq_len;
> > + int priq_len;
> > +
> > /* supported and enabled hardware capabilities */
> > u64 cap;
> >
> > @@ -53,6 +76,11 @@ struct riscv_iommu_device {
> > unsigned ddt_mode;
> > bool ddtp_in_iomem;
> >
> > + /* hardware queues */
> > + struct riscv_iommu_queue cmdq;
> > + struct riscv_iommu_queue fltq;
> > + struct riscv_iommu_queue priq;
> > +
> > /* Connected end-points */
> > struct rb_root eps;
> > struct mutex eps_mutex;
> > --
> > 2.34.1
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/linux-riscv

best,
- Tomasz

2023-08-02 22:21:41

by Tomasz Jeznach

[permalink] [raw]
Subject: Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues

On Mon, Jul 31, 2023 at 5:38 PM Zong Li <[email protected]> wrote:
>
> On Tue, Aug 1, 2023 at 7:35 AM Nick Kossifidis <[email protected]> wrote:
> >
> > On 7/31/23 16:15, Zong Li wrote:
> > > On Mon, Jul 31, 2023 at 5:32 PM Nick Kossifidis <[email protected]> wrote:
> > >>
> > >> On 7/29/23 15:58, Zong Li wrote:
> > >>> On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <[email protected]> wrote:
> > >>>> + iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
> > >>>> +
> > >>>> + /* For now we only support WSIs until we have AIA support */
> > >>>
> > >>> I'm not completely understand AIA support here, because I saw the pci
> > >>> case uses the MSI, and kernel seems to have the AIA implementation.
> > >>> Could you please elaborate it?
> > >>>
> > >>
> > >> When I wrote this we didn't have AIA in the kernel, and without IMSIC we
> > >> can't have MSIs in the hart (we can still have MSIs in the PCIe controller).
> > >
> > > Thanks for your clarification, do we support the MSI in next version?
> > >
> >
> > I don't think there is an IOMMU implementation out there (emulated or in
> > hw) that can do MSIs and is not a pcie device (the QEMU implementation
> > is a pcie device). If we have something to test this against, and we
> > also have an IMSIC etc, we can work on that.
>
> I guess I can assist with that. We have an IOMMU hardware (non-pcie
> device) that has already implemented the MSI functionality, and I have
> conducted testing on it. Perhaps let me add the related implementation
> here after this series is merged.

Thanks, getting MSI support for non-PCIe IOMMU hardware would be great!

best,
- Tomasz

2023-08-03 09:02:38

by Zong Li

[permalink] [raw]
Subject: Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues

On Thu, Aug 3, 2023 at 4:50 AM Tomasz Jeznach <[email protected]> wrote:
>
> On Sat, Jul 29, 2023 at 5:58 AM Zong Li <[email protected]> wrote:
> >
> > On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <[email protected]> wrote:
> > >
> > > Enables message or wire signal interrupts for PCIe and platforms devices.
> > >
> > > Co-developed-by: Nick Kossifidis <[email protected]>
> > > Signed-off-by: Nick Kossifidis <[email protected]>
> > > Signed-off-by: Tomasz Jeznach <[email protected]>
> > > ---
> > > drivers/iommu/riscv/iommu-pci.c | 72 ++++
> > > drivers/iommu/riscv/iommu-platform.c | 66 +++
> > > drivers/iommu/riscv/iommu.c | 604 ++++++++++++++++++++++++++-
> > > drivers/iommu/riscv/iommu.h | 28 ++
> > > 4 files changed, 769 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/iommu/riscv/iommu-pci.c b/drivers/iommu/riscv/iommu-pci.c
> > > index c91f963d7a29..9ea0647f7b92 100644
> > > --- a/drivers/iommu/riscv/iommu-pci.c
> > > +++ b/drivers/iommu/riscv/iommu-pci.c
> > > @@ -34,6 +34,7 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
> > > {
> > > struct device *dev = &pdev->dev;
> > > struct riscv_iommu_device *iommu;
> > > + u64 icvec;
> > > int ret;
> > >
> > > ret = pci_enable_device_mem(pdev);
> > > @@ -67,14 +68,84 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
> > > iommu->dev = dev;
> > > dev_set_drvdata(dev, iommu);
> > >
> > > + /* Check device reported capabilities. */
> > > + iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
> > > +
> > > + /* The PCI driver only uses MSIs, make sure the IOMMU supports this */
> > > + switch (FIELD_GET(RISCV_IOMMU_CAP_IGS, iommu->cap)) {
> > > + case RISCV_IOMMU_CAP_IGS_MSI:
> > > + case RISCV_IOMMU_CAP_IGS_BOTH:
> > > + break;
> > > + default:
> > > + dev_err(dev, "unable to use message-signaled interrupts\n");
> > > + ret = -ENODEV;
> > > + goto fail;
> > > + }
> > > +
> > > dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
> > > pci_set_master(pdev);
> > >
> > > + /* Allocate and assign IRQ vectors for the various events */
> > > + ret = pci_alloc_irq_vectors(pdev, 1, RISCV_IOMMU_INTR_COUNT, PCI_IRQ_MSIX);
> > > + if (ret < 0) {
> > > + dev_err(dev, "unable to allocate irq vectors\n");
> > > + goto fail;
> > > + }
> > > +
> > > + ret = -ENODEV;
> > > +
> > > + iommu->irq_cmdq = msi_get_virq(dev, RISCV_IOMMU_INTR_CQ);
> > > + if (!iommu->irq_cmdq) {
> > > + dev_warn(dev, "no MSI vector %d for the command queue\n",
> > > + RISCV_IOMMU_INTR_CQ);
> > > + goto fail;
> > > + }
> > > +
> > > + iommu->irq_fltq = msi_get_virq(dev, RISCV_IOMMU_INTR_FQ);
> > > + if (!iommu->irq_fltq) {
> > > + dev_warn(dev, "no MSI vector %d for the fault/event queue\n",
> > > + RISCV_IOMMU_INTR_FQ);
> > > + goto fail;
> > > + }
> > > +
> > > + if (iommu->cap & RISCV_IOMMU_CAP_HPM) {
> > > + iommu->irq_pm = msi_get_virq(dev, RISCV_IOMMU_INTR_PM);
> > > + if (!iommu->irq_pm) {
> > > + dev_warn(dev,
> > > + "no MSI vector %d for performance monitoring\n",
> > > + RISCV_IOMMU_INTR_PM);
> > > + goto fail;
> > > + }
> > > + }
> > > +
> > > + if (iommu->cap & RISCV_IOMMU_CAP_ATS) {
> > > + iommu->irq_priq = msi_get_virq(dev, RISCV_IOMMU_INTR_PQ);
> > > + if (!iommu->irq_priq) {
> > > + dev_warn(dev,
> > > + "no MSI vector %d for page-request queue\n",
> > > + RISCV_IOMMU_INTR_PQ);
> > > + goto fail;
> > > + }
> > > + }
> > > +
> > > + /* Set simple 1:1 mapping for MSI vectors */
> > > + icvec = FIELD_PREP(RISCV_IOMMU_IVEC_CIV, RISCV_IOMMU_INTR_CQ) |
> > > + FIELD_PREP(RISCV_IOMMU_IVEC_FIV, RISCV_IOMMU_INTR_FQ);
> > > +
> > > + if (iommu->cap & RISCV_IOMMU_CAP_HPM)
> > > + icvec |= FIELD_PREP(RISCV_IOMMU_IVEC_PMIV, RISCV_IOMMU_INTR_PM);
> > > +
> > > + if (iommu->cap & RISCV_IOMMU_CAP_ATS)
> > > + icvec |= FIELD_PREP(RISCV_IOMMU_IVEC_PIV, RISCV_IOMMU_INTR_PQ);
> > > +
> > > + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IVEC, icvec);
> > > +
> > > ret = riscv_iommu_init(iommu);
> > > if (!ret)
> > > return ret;
> > >
> > > fail:
> > > + pci_free_irq_vectors(pdev);
> > > pci_clear_master(pdev);
> > > pci_release_regions(pdev);
> > > pci_disable_device(pdev);
> > > @@ -85,6 +156,7 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
> > > static void riscv_iommu_pci_remove(struct pci_dev *pdev)
> > > {
> > > riscv_iommu_remove(dev_get_drvdata(&pdev->dev));
> > > + pci_free_irq_vectors(pdev);
> > > pci_clear_master(pdev);
> > > pci_release_regions(pdev);
> > > pci_disable_device(pdev);
> > > diff --git a/drivers/iommu/riscv/iommu-platform.c b/drivers/iommu/riscv/iommu-platform.c
> > > index e4e8ca6711e7..35935d3c7ef4 100644
> > > --- a/drivers/iommu/riscv/iommu-platform.c
> > > +++ b/drivers/iommu/riscv/iommu-platform.c
> > > @@ -20,6 +20,8 @@ static int riscv_iommu_platform_probe(struct platform_device *pdev)
> > > struct device *dev = &pdev->dev;
> > > struct riscv_iommu_device *iommu = NULL;
> > > struct resource *res = NULL;
> > > + u32 fctl = 0;
> > > + int irq = 0;
> > > int ret = 0;
> > >
> > > iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
> > > @@ -53,6 +55,70 @@ static int riscv_iommu_platform_probe(struct platform_device *pdev)
> > > goto fail;
> > > }
> > >
> > > + iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
> > > +
> > > + /* For now we only support WSIs until we have AIA support */
> >
> > I'm not completely understand AIA support here, because I saw the pci
> > case uses the MSI, and kernel seems to have the AIA implementation.
> > Could you please elaborate it?
> >
> > > + ret = FIELD_GET(RISCV_IOMMU_CAP_IGS, iommu->cap);
> > > + if (ret == RISCV_IOMMU_CAP_IGS_MSI) {
> > > + dev_err(dev, "IOMMU only supports MSIs\n");
> > > + goto fail;
> > > + }
> > > +
> > > + /* Parse IRQ assignment */
> > > + irq = platform_get_irq_byname_optional(pdev, "cmdq");
> > > + if (irq > 0)
> > > + iommu->irq_cmdq = irq;
> > > + else {
> > > + dev_err(dev, "no IRQ provided for the command queue\n");
> > > + goto fail;
> > > + }
> > > +
> > > + irq = platform_get_irq_byname_optional(pdev, "fltq");
> > > + if (irq > 0)
> > > + iommu->irq_fltq = irq;
> > > + else {
> > > + dev_err(dev, "no IRQ provided for the fault/event queue\n");
> > > + goto fail;
> > > + }
> > > +
> > > + if (iommu->cap & RISCV_IOMMU_CAP_HPM) {
> > > + irq = platform_get_irq_byname_optional(pdev, "pm");
> > > + if (irq > 0)
> > > + iommu->irq_pm = irq;
> > > + else {
> > > + dev_err(dev, "no IRQ provided for performance monitoring\n");
> > > + goto fail;
> > > + }
> > > + }
> > > +
> > > + if (iommu->cap & RISCV_IOMMU_CAP_ATS) {
> > > + irq = platform_get_irq_byname_optional(pdev, "priq");
> > > + if (irq > 0)
> > > + iommu->irq_priq = irq;
> > > + else {
> > > + dev_err(dev, "no IRQ provided for the page-request queue\n");
> > > + goto fail;
> > > + }
> > > + }
> >
> > Should we define the "interrupt-names" in dt-bindings?
> >
>
> Yes, this was brought up earlier wrt dt-bindings.
>
> I'm considering removal of interrupt names from DT (and get-byname
> option), as IOMMU hardware cause-to-vector remapping `icvec` should be
> used to map interrupt source to actual interrupt vector. If possible
> device driver should map cause to interrupt (based on number of
> vectors available) or rely on ICVEC WARL properties to discover fixed
> cause-to-vector mapping in the hardware.
>

I'm not sure if I understand correctly, but one thing we might need to
consider is when the vector numbers are less than interrupt sources,
for example, if IOMMU only supports single vector (i.e. supports a
single interrupt wire), the `request_threaded_irq` will request the
irq three times by using same irq number for command, fault and
page-request queues. It would cause a problem and fail to request an
IRQ for the other two queues. It seems that we still need to consider
this situation regardless of how we determine the IRQs for each
interrupt source.

> Please let me know if this is reasonable change.
>
> > > +
> > > + /* Make sure fctl.WSI is set */
> > > + fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL);
> > > + fctl |= RISCV_IOMMU_FCTL_WSI;
> > > + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, fctl);
> > > +
> > > + /* Parse Queue lengts */
> > > + ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> > > + if (!ret)
> > > + dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> > > +
> > > + ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> > > + if (!ret)
> > > + dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> > > +
> > > + ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> > > + if (!ret)
> > > + dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
> > > +
> > > dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> > >
> > > return riscv_iommu_init(iommu);
> > > diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> > > index 31dc3c458e13..5c4cf9875302 100644
> > > --- a/drivers/iommu/riscv/iommu.c
> > > +++ b/drivers/iommu/riscv/iommu.c
> > > @@ -45,6 +45,18 @@ static int ddt_mode = RISCV_IOMMU_DDTP_MODE_BARE;
> > > module_param(ddt_mode, int, 0644);
> > > MODULE_PARM_DESC(ddt_mode, "Device Directory Table mode.");
> > >
> > > +static int cmdq_length = 1024;
> > > +module_param(cmdq_length, int, 0644);
> > > +MODULE_PARM_DESC(cmdq_length, "Command queue length.");
> > > +
> > > +static int fltq_length = 1024;
> > > +module_param(fltq_length, int, 0644);
> > > +MODULE_PARM_DESC(fltq_length, "Fault queue length.");
> > > +
> > > +static int priq_length = 1024;
> > > +module_param(priq_length, int, 0644);
> > > +MODULE_PARM_DESC(priq_length, "Page request interface queue length.");
> > > +
> > > /* IOMMU PSCID allocation namespace. */
> > > #define RISCV_IOMMU_MAX_PSCID (1U << 20)
> > > static DEFINE_IDA(riscv_iommu_pscids);
> > > @@ -65,6 +77,497 @@ static DEFINE_IDA(riscv_iommu_pscids);
> > > static const struct iommu_domain_ops riscv_iommu_domain_ops;
> > > static const struct iommu_ops riscv_iommu_ops;
> > >
> > > +/*
> > > + * Common queue management routines
> > > + */
> > > +
> > > +/* Note: offsets are the same for all queues */
> > > +#define Q_HEAD(q) ((q)->qbr + (RISCV_IOMMU_REG_CQH - RISCV_IOMMU_REG_CQB))
> > > +#define Q_TAIL(q) ((q)->qbr + (RISCV_IOMMU_REG_CQT - RISCV_IOMMU_REG_CQB))
> > > +
> > > +static unsigned riscv_iommu_queue_consume(struct riscv_iommu_device *iommu,
> > > + struct riscv_iommu_queue *q, unsigned *ready)
> > > +{
> > > + u32 tail = riscv_iommu_readl(iommu, Q_TAIL(q));
> > > + *ready = q->lui;
> > > +
> > > + BUG_ON(q->cnt <= tail);
> > > + if (q->lui <= tail)
> > > + return tail - q->lui;
> > > + return q->cnt - q->lui;
> > > +}
> > > +
> > > +static void riscv_iommu_queue_release(struct riscv_iommu_device *iommu,
> > > + struct riscv_iommu_queue *q, unsigned count)
> > > +{
> > > + q->lui = (q->lui + count) & (q->cnt - 1);
> > > + riscv_iommu_writel(iommu, Q_HEAD(q), q->lui);
> > > +}
> > > +
> > > +static u32 riscv_iommu_queue_ctrl(struct riscv_iommu_device *iommu,
> > > + struct riscv_iommu_queue *q, u32 val)
> > > +{
> > > + cycles_t end_cycles = RISCV_IOMMU_TIMEOUT + get_cycles();
> > > +
> > > + riscv_iommu_writel(iommu, q->qcr, val);
> > > + do {
> > > + val = riscv_iommu_readl(iommu, q->qcr);
> > > + if (!(val & RISCV_IOMMU_QUEUE_BUSY))
> > > + break;
> > > + cpu_relax();
> > > + } while (get_cycles() < end_cycles);
> > > +
> > > + return val;
> > > +}
> > > +
> > > +static void riscv_iommu_queue_free(struct riscv_iommu_device *iommu,
> > > + struct riscv_iommu_queue *q)
> > > +{
> > > + size_t size = q->len * q->cnt;
> > > +
> > > + riscv_iommu_queue_ctrl(iommu, q, 0);
> > > +
> > > + if (q->base) {
> > > + if (q->in_iomem)
> > > + iounmap(q->base);
> > > + else
> > > + dmam_free_coherent(iommu->dev, size, q->base, q->base_dma);
> > > + }
> > > + if (q->irq)
> > > + free_irq(q->irq, q);
> > > +}
> > > +
> > > +static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data);
> > > +static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data);
> > > +static irqreturn_t riscv_iommu_fltq_irq_check(int irq, void *data);
> > > +static irqreturn_t riscv_iommu_fltq_process(int irq, void *data);
> > > +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data);
> > > +static irqreturn_t riscv_iommu_priq_process(int irq, void *data);
> > > +
> > > +static int riscv_iommu_queue_init(struct riscv_iommu_device *iommu, int queue_id)
> > > +{
> > > + struct device *dev = iommu->dev;
> > > + struct riscv_iommu_queue *q = NULL;
> > > + size_t queue_size = 0;
> > > + irq_handler_t irq_check;
> > > + irq_handler_t irq_process;
> > > + const char *name;
> > > + int count = 0;
> > > + int irq = 0;
> > > + unsigned order = 0;
> > > + u64 qbr_val = 0;
> > > + u64 qbr_readback = 0;
> > > + u64 qbr_paddr = 0;
> > > + int ret = 0;
> > > +
> > > + switch (queue_id) {
> > > + case RISCV_IOMMU_COMMAND_QUEUE:
> > > + q = &iommu->cmdq;
> > > + q->len = sizeof(struct riscv_iommu_command);
> > > + count = iommu->cmdq_len;
> > > + irq = iommu->irq_cmdq;
> > > + irq_check = riscv_iommu_cmdq_irq_check;
> > > + irq_process = riscv_iommu_cmdq_process;
> > > + q->qbr = RISCV_IOMMU_REG_CQB;
> > > + q->qcr = RISCV_IOMMU_REG_CQCSR;
> > > + name = "cmdq";
> > > + break;
> > > + case RISCV_IOMMU_FAULT_QUEUE:
> > > + q = &iommu->fltq;
> > > + q->len = sizeof(struct riscv_iommu_fq_record);
> > > + count = iommu->fltq_len;
> > > + irq = iommu->irq_fltq;
> > > + irq_check = riscv_iommu_fltq_irq_check;
> > > + irq_process = riscv_iommu_fltq_process;
> > > + q->qbr = RISCV_IOMMU_REG_FQB;
> > > + q->qcr = RISCV_IOMMU_REG_FQCSR;
> > > + name = "fltq";
> > > + break;
> > > + case RISCV_IOMMU_PAGE_REQUEST_QUEUE:
> > > + q = &iommu->priq;
> > > + q->len = sizeof(struct riscv_iommu_pq_record);
> > > + count = iommu->priq_len;
> > > + irq = iommu->irq_priq;
> > > + irq_check = riscv_iommu_priq_irq_check;
> > > + irq_process = riscv_iommu_priq_process;
> > > + q->qbr = RISCV_IOMMU_REG_PQB;
> > > + q->qcr = RISCV_IOMMU_REG_PQCSR;
> > > + name = "priq";
> > > + break;
> > > + default:
> > > + dev_err(dev, "invalid queue interrupt index in queue_init!\n");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + /* Polling not implemented */
> > > + if (!irq)
> > > + return -ENODEV;
> > > +
> > > + /* Allocate queue in memory and set the base register */
> > > + order = ilog2(count);
> > > + do {
> > > + queue_size = q->len * (1ULL << order);
> > > + q->base = dmam_alloc_coherent(dev, queue_size, &q->base_dma, GFP_KERNEL);
> > > + if (q->base || queue_size < PAGE_SIZE)
> > > + break;
> > > +
> > > + order--;
> > > + } while (1);
> > > +
> > > + if (!q->base) {
> > > + dev_err(dev, "failed to allocate %s queue (cnt: %u)\n", name, count);
> > > + return -ENOMEM;
> > > + }
> > > +
> > > + q->cnt = 1ULL << order;
> > > +
> > > + qbr_val = phys_to_ppn(q->base_dma) |
> > > + FIELD_PREP(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, order - 1);
> > > +
> > > + riscv_iommu_writeq(iommu, q->qbr, qbr_val);
> > > +
> > > + /*
> > > + * Queue base registers are WARL, so it's possible that whatever we wrote
> > > + * there was illegal/not supported by the hw in which case we need to make
> > > + * sure we set a supported PPN and/or queue size.
> > > + */
> > > + qbr_readback = riscv_iommu_readq(iommu, q->qbr);
> > > + if (qbr_readback == qbr_val)
> > > + goto irq;
> > > +
> > > + dmam_free_coherent(dev, queue_size, q->base, q->base_dma);
> > > +
> > > + /* Get supported queue size */
> > > + order = FIELD_GET(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, qbr_readback) + 1;
> > > + q->cnt = 1ULL << order;
> > > + queue_size = q->len * q->cnt;
> > > +
> > > + /*
> > > + * In case we also failed to set PPN, it means the field is hardcoded and the
> > > + * queue resides in I/O memory instead, so get its physical address and
> > > + * ioremap it.
> > > + */
> > > + qbr_paddr = ppn_to_phys(qbr_readback);
> > > + if (qbr_paddr != q->base_dma) {
> > > + dev_info(dev,
> > > + "hardcoded ppn in %s base register, using io memory for the queue\n",
> > > + name);
> > > + dev_info(dev, "queue length for %s set to %i\n", name, q->cnt);
> > > + q->in_iomem = true;
> > > + q->base = ioremap(qbr_paddr, queue_size);
> > > + if (!q->base) {
> > > + dev_err(dev, "failed to map %s queue (cnt: %u)\n", name, q->cnt);
> > > + return -ENOMEM;
> > > + }
> > > + q->base_dma = qbr_paddr;
> > > + } else {
> > > + /*
> > > + * We only failed to set the queue size, re-try to allocate memory with
> > > + * the queue size supported by the hw.
> > > + */
> > > + dev_info(dev, "hardcoded queue size in %s base register\n", name);
> > > + dev_info(dev, "retrying with queue length: %i\n", q->cnt);
> > > + q->base = dmam_alloc_coherent(dev, queue_size, &q->base_dma, GFP_KERNEL);
> > > + if (!q->base) {
> > > + dev_err(dev, "failed to allocate %s queue (cnt: %u)\n",
> > > + name, q->cnt);
> > > + return -ENOMEM;
> > > + }
> > > + }
> > > +
> > > + qbr_val = phys_to_ppn(q->base_dma) |
> > > + FIELD_PREP(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, order - 1);
> > > + riscv_iommu_writeq(iommu, q->qbr, qbr_val);
> > > +
> > > + /* Final check to make sure hw accepted our write */
> > > + qbr_readback = riscv_iommu_readq(iommu, q->qbr);
> > > + if (qbr_readback != qbr_val) {
> > > + dev_err(dev, "failed to set base register for %s\n", name);
> > > + goto fail;
> > > + }
> > > +
> > > + irq:
> > > + if (request_threaded_irq(irq, irq_check, irq_process, IRQF_ONESHOT | IRQF_SHARED,
> > > + dev_name(dev), q)) {
> > > + dev_err(dev, "fail to request irq %d for %s\n", irq, name);
> > > + goto fail;
> > > + }
> > > +
> > > + q->irq = irq;
> > > +
> > > + /* Note: All RIO_xQ_EN/IE fields are in the same offsets */
> > > + ret =
> > > + riscv_iommu_queue_ctrl(iommu, q,
> > > + RISCV_IOMMU_QUEUE_ENABLE |
> > > + RISCV_IOMMU_QUEUE_INTR_ENABLE);
> > > + if (ret & RISCV_IOMMU_QUEUE_BUSY) {
> > > + dev_err(dev, "%s init timeout\n", name);
> > > + ret = -EBUSY;
> > > + goto fail;
> > > + }
> > > +
> > > + return 0;
> > > +
> > > + fail:
> > > + riscv_iommu_queue_free(iommu, q);
> > > + return 0;
> > > +}
> > > +
> > > +/*
> > > + * I/O MMU Command queue chapter 3.1
> > > + */
> > > +
> > > +static inline void riscv_iommu_cmd_inval_vma(struct riscv_iommu_command *cmd)
> > > +{
> > > + cmd->dword0 =
> > > + FIELD_PREP(RISCV_IOMMU_CMD_OPCODE,
> > > + RISCV_IOMMU_CMD_IOTINVAL_OPCODE) | FIELD_PREP(RISCV_IOMMU_CMD_FUNC,
> > > + RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA);
> > > + cmd->dword1 = 0;
> > > +}
> > > +
> > > +static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
> > > + u64 addr)
> > > +{
> > > + cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
> > > + cmd->dword1 = addr;
> > > +}
> > > +
> > > +static inline void riscv_iommu_cmd_inval_set_pscid(struct riscv_iommu_command *cmd,
> > > + unsigned pscid)
> > > +{
> > > + cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_PSCID, pscid) |
> > > + RISCV_IOMMU_CMD_IOTINVAL_PSCV;
> > > +}
> > > +
> > > +static inline void riscv_iommu_cmd_inval_set_gscid(struct riscv_iommu_command *cmd,
> > > + unsigned gscid)
> > > +{
> > > + cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_GSCID, gscid) |
> > > + RISCV_IOMMU_CMD_IOTINVAL_GV;
> > > +}
> > > +
> > > +static inline void riscv_iommu_cmd_iofence(struct riscv_iommu_command *cmd)
> > > +{
> > > + cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOFENCE_OPCODE) |
> > > + FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOFENCE_FUNC_C);
> > > + cmd->dword1 = 0;
> > > +}
> > > +
> > > +static inline void riscv_iommu_cmd_iofence_set_av(struct riscv_iommu_command *cmd,
> > > + u64 addr, u32 data)
> > > +{
> > > + cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOFENCE_OPCODE) |
> > > + FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOFENCE_FUNC_C) |
> > > + FIELD_PREP(RISCV_IOMMU_CMD_IOFENCE_DATA, data) | RISCV_IOMMU_CMD_IOFENCE_AV;
> > > + cmd->dword1 = (addr >> 2);
> > > +}
> > > +
> > > +static inline void riscv_iommu_cmd_iodir_inval_ddt(struct riscv_iommu_command *cmd)
> > > +{
> > > + cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IODIR_OPCODE) |
> > > + FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT);
> > > + cmd->dword1 = 0;
> > > +}
> > > +
> > > +static inline void riscv_iommu_cmd_iodir_inval_pdt(struct riscv_iommu_command *cmd)
> > > +{
> > > + cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IODIR_OPCODE) |
> > > + FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT);
> > > + cmd->dword1 = 0;
> > > +}
> > > +
> > > +static inline void riscv_iommu_cmd_iodir_set_did(struct riscv_iommu_command *cmd,
> > > + unsigned devid)
> > > +{
> > > + cmd->dword0 |=
> > > + FIELD_PREP(RISCV_IOMMU_CMD_IODIR_DID, devid) | RISCV_IOMMU_CMD_IODIR_DV;
> > > +}
> > > +
> > > +/* TODO: Convert into lock-less MPSC implementation. */
> > > +static bool riscv_iommu_post_sync(struct riscv_iommu_device *iommu,
> > > + struct riscv_iommu_command *cmd, bool sync)
> > > +{
> > > + u32 head, tail, next, last;
> > > + unsigned long flags;
> > > +
> > > + spin_lock_irqsave(&iommu->cq_lock, flags);
> > > + head = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQH) & (iommu->cmdq.cnt - 1);
> > > + tail = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQT) & (iommu->cmdq.cnt - 1);
> > > + last = iommu->cmdq.lui;
> > > + if (tail != last) {
> > > + spin_unlock_irqrestore(&iommu->cq_lock, flags);
> > > + /*
> > > + * FIXME: This is a workaround for dropped MMIO writes/reads on QEMU platform.
> > > + * While debugging of the problem is still ongoing, this provides
> > > + * a simple impolementation of try-again policy.
> > > + * Will be changed to lock-less algorithm in the feature.
> > > + */
> > > + dev_dbg(iommu->dev, "IOMMU CQT: %x != %x (1st)\n", last, tail);
> > > + spin_lock_irqsave(&iommu->cq_lock, flags);
> > > + tail =
> > > + riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQT) & (iommu->cmdq.cnt - 1);
> > > + last = iommu->cmdq.lui;
> > > + if (tail != last) {
> > > + spin_unlock_irqrestore(&iommu->cq_lock, flags);
> > > + dev_dbg(iommu->dev, "IOMMU CQT: %x != %x (2nd)\n", last, tail);
> > > + spin_lock_irqsave(&iommu->cq_lock, flags);
> > > + }
> > > + }
> > > +
> > > + next = (last + 1) & (iommu->cmdq.cnt - 1);
> > > + if (next != head) {
> > > + struct riscv_iommu_command *ptr = iommu->cmdq.base;
> > > + ptr[last] = *cmd;
> > > + wmb();
> > > + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_CQT, next);
> > > + iommu->cmdq.lui = next;
> > > + }
> > > +
> > > + spin_unlock_irqrestore(&iommu->cq_lock, flags);
> > > +
> > > + if (sync && head != next) {
> > > + cycles_t start_time = get_cycles();
> > > + while (1) {
> > > + last = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQH) &
> > > + (iommu->cmdq.cnt - 1);
> > > + if (head < next && last >= next)
> > > + break;
> > > + if (head > next && last < head && last >= next)
> > > + break;
> > > + if (RISCV_IOMMU_TIMEOUT < (get_cycles() - start_time)) {
> >
> > This condition will be imprecise, because here is not in irq disabled
> > context, it will be scheduled out or preempted. When we come back
> > here, it might be over 1 second, but the IOFENCE is actually
> > completed.
> >
>
> Good point. Thank.
>
>
> > > + dev_err(iommu->dev, "IOFENCE TIMEOUT\n");
> > > + return false;
> > > + }
> > > + cpu_relax();
> > > + }
> > > + }
> > > +
> > > + return next != head;
> > > +}
> > > +
> > > +static bool riscv_iommu_post(struct riscv_iommu_device *iommu,
> > > + struct riscv_iommu_command *cmd)
> > > +{
> > > + return riscv_iommu_post_sync(iommu, cmd, false);
> > > +}
> > > +
> > > +static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
> > > +{
> > > + struct riscv_iommu_command cmd;
> > > + riscv_iommu_cmd_iofence(&cmd);
> > > + return riscv_iommu_post_sync(iommu, &cmd, true);
> > > +}
> > > +
> > > +/* Command queue primary interrupt handler */
> > > +static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data)
> > > +{
> > > + struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > > + struct riscv_iommu_device *iommu =
> > > + container_of(q, struct riscv_iommu_device, cmdq);
> > > + u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> > > + if (ipsr & RISCV_IOMMU_IPSR_CIP)
> > > + return IRQ_WAKE_THREAD;
> > > + return IRQ_NONE;
> > > +}
> > > +
> > > +/* Command queue interrupt hanlder thread function */
> > > +static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data)
> > > +{
> > > + struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > > + struct riscv_iommu_device *iommu;
> > > + unsigned ctrl;
> > > +
> > > + iommu = container_of(q, struct riscv_iommu_device, cmdq);
> > > +
> > > + /* Error reporting, clear error reports if any. */
> > > + ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQCSR);
> > > + if (ctrl & (RISCV_IOMMU_CQCSR_CQMF |
> > > + RISCV_IOMMU_CQCSR_CMD_TO | RISCV_IOMMU_CQCSR_CMD_ILL)) {
> > > + riscv_iommu_queue_ctrl(iommu, &iommu->cmdq, ctrl);
> > > + dev_warn_ratelimited(iommu->dev,
> > > + "Command queue error: fault: %d tout: %d err: %d\n",
> > > + !!(ctrl & RISCV_IOMMU_CQCSR_CQMF),
> > > + !!(ctrl & RISCV_IOMMU_CQCSR_CMD_TO),
> > > + !!(ctrl & RISCV_IOMMU_CQCSR_CMD_ILL));
> >
> > We need to handle the error by either adjusting the tail to remove the
> > failed command or fixing the failed command itself. Otherwise, the
> > failed command will keep in the queue and IOMMU will try to execute
> > it. I guess the first option might be easier to implement.
> >
>
> Correct. Thanks for pointing this out.
> Error handling / recovery was not pushed in this series. There is
> work-in-progress series to handle various types of failures, including
> command processing errors, DDT misconfiguration, queue overflows,
> device reported faults handling, etc. I can bring some of the error
> handling here, if needed. Otherwise I'd prefer to keep it as separate
> series, sent out once this one is merged.

It sounds good to me, thanks.

>
> > > + }
> > > +
> > > + /* Clear fault interrupt pending. */
> > > + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_CIP);
> > > +
> > > + return IRQ_HANDLED;
> > > +}
> > > +
> > > +/*
> > > + * Fault/event queue, chapter 3.2
> > > + */
> > > +
> > > +static void riscv_iommu_fault_report(struct riscv_iommu_device *iommu,
> > > + struct riscv_iommu_fq_record *event)
> > > +{
> > > + unsigned err, devid;
> > > +
> > > + err = FIELD_GET(RISCV_IOMMU_FQ_HDR_CAUSE, event->hdr);
> > > + devid = FIELD_GET(RISCV_IOMMU_FQ_HDR_DID, event->hdr);
> > > +
> > > + dev_warn_ratelimited(iommu->dev,
> > > + "Fault %d devid: %d" " iotval: %llx iotval2: %llx\n", err,
> > > + devid, event->iotval, event->iotval2);
> > > +}
> > > +
> > > +/* Fault/event queue primary interrupt handler */
> > > +static irqreturn_t riscv_iommu_fltq_irq_check(int irq, void *data)
> > > +{
> > > + struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > > + struct riscv_iommu_device *iommu =
> > > + container_of(q, struct riscv_iommu_device, fltq);
> > > + u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> > > + if (ipsr & RISCV_IOMMU_IPSR_FIP)
> > > + return IRQ_WAKE_THREAD;
> > > + return IRQ_NONE;
> > > +}
> > > +
> > > +/* Fault queue interrupt hanlder thread function */
> > > +static irqreturn_t riscv_iommu_fltq_process(int irq, void *data)
> > > +{
> > > + struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > > + struct riscv_iommu_device *iommu;
> > > + struct riscv_iommu_fq_record *events;
> > > + unsigned cnt, len, idx, ctrl;
> > > +
> > > + iommu = container_of(q, struct riscv_iommu_device, fltq);
> > > + events = (struct riscv_iommu_fq_record *)q->base;
> > > +
> > > + /* Error reporting, clear error reports if any. */
> > > + ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FQCSR);
> > > + if (ctrl & (RISCV_IOMMU_FQCSR_FQMF | RISCV_IOMMU_FQCSR_FQOF)) {
> > > + riscv_iommu_queue_ctrl(iommu, &iommu->fltq, ctrl);
> > > + dev_warn_ratelimited(iommu->dev,
> > > + "Fault queue error: fault: %d full: %d\n",
> > > + !!(ctrl & RISCV_IOMMU_FQCSR_FQMF),
> > > + !!(ctrl & RISCV_IOMMU_FQCSR_FQOF));
> > > + }
> > > +
> > > + /* Clear fault interrupt pending. */
> > > + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_FIP);
> > > +
> > > + /* Report fault events. */
> > > + do {
> > > + cnt = riscv_iommu_queue_consume(iommu, q, &idx);
> > > + if (!cnt)
> > > + break;
> > > + for (len = 0; len < cnt; idx++, len++)
> > > + riscv_iommu_fault_report(iommu, &events[idx]);
> > > + riscv_iommu_queue_release(iommu, q, cnt);
> > > + } while (1);
> > > +
> > > + return IRQ_HANDLED;
> > > +}
> > > +
> > > +/*
> > > + * Page request queue, chapter 3.3
> > > + */
> > > +
> > > /*
> > > * Register device for IOMMU tracking.
> > > */
> > > @@ -97,6 +600,54 @@ static void riscv_iommu_add_device(struct riscv_iommu_device *iommu, struct devi
> > > mutex_unlock(&iommu->eps_mutex);
> > > }
> > >
> > > +/* Page request interface queue primary interrupt handler */
> > > +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data)
> > > +{
> > > + struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > > + struct riscv_iommu_device *iommu =
> > > + container_of(q, struct riscv_iommu_device, priq);
> > > + u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> > > + if (ipsr & RISCV_IOMMU_IPSR_PIP)
> > > + return IRQ_WAKE_THREAD;
> > > + return IRQ_NONE;
> > > +}
> > > +
> > > +/* Page request interface queue interrupt hanlder thread function */
> > > +static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
> > > +{
> > > + struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > > + struct riscv_iommu_device *iommu;
> > > + struct riscv_iommu_pq_record *requests;
> > > + unsigned cnt, idx, ctrl;
> > > +
> > > + iommu = container_of(q, struct riscv_iommu_device, priq);
> > > + requests = (struct riscv_iommu_pq_record *)q->base;
> > > +
> > > + /* Error reporting, clear error reports if any. */
> > > + ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_PQCSR);
> > > + if (ctrl & (RISCV_IOMMU_PQCSR_PQMF | RISCV_IOMMU_PQCSR_PQOF)) {
> > > + riscv_iommu_queue_ctrl(iommu, &iommu->priq, ctrl);
> > > + dev_warn_ratelimited(iommu->dev,
> > > + "Page request queue error: fault: %d full: %d\n",
> > > + !!(ctrl & RISCV_IOMMU_PQCSR_PQMF),
> > > + !!(ctrl & RISCV_IOMMU_PQCSR_PQOF));
> > > + }
> > > +
> > > + /* Clear page request interrupt pending. */
> > > + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_PIP);
> > > +
> > > + /* Process page requests. */
> > > + do {
> > > + cnt = riscv_iommu_queue_consume(iommu, q, &idx);
> > > + if (!cnt)
> > > + break;
> > > + dev_warn(iommu->dev, "unexpected %u page requests\n", cnt);
> > > + riscv_iommu_queue_release(iommu, q, cnt);
> > > + } while (1);
> > > +
> > > + return IRQ_HANDLED;
> > > +}
> > > +
> > > /*
> > > * Endpoint management
> > > */
> > > @@ -350,7 +901,29 @@ static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
> > > unsigned long *start, unsigned long *end,
> > > size_t *pgsize)
> > > {
> > > - /* Command interface not implemented */
> > > + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> > > + struct riscv_iommu_command cmd;
> > > + unsigned long iova;
> > > +
> > > + if (domain->mode == RISCV_IOMMU_DC_FSC_MODE_BARE)
> > > + return;
> > > +
> > > + /* Domain not attached to an IOMMU! */
> > > + BUG_ON(!domain->iommu);
> > > +
> > > + riscv_iommu_cmd_inval_vma(&cmd);
> > > + riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
> > > +
> > > + if (start && end && pgsize) {
> > > + /* Cover only the range that is needed */
> > > + for (iova = *start; iova <= *end; iova += *pgsize) {
> > > + riscv_iommu_cmd_inval_set_addr(&cmd, iova);
> > > + riscv_iommu_post(domain->iommu, &cmd);
> > > + }
> > > + } else {
> > > + riscv_iommu_post(domain->iommu, &cmd);
> > > + }
> > > + riscv_iommu_iofence_sync(domain->iommu);
> > > }
> > >
> > > static void riscv_iommu_flush_iotlb_all(struct iommu_domain *iommu_domain)
> > > @@ -610,6 +1183,9 @@ void riscv_iommu_remove(struct riscv_iommu_device *iommu)
> > > iommu_device_unregister(&iommu->iommu);
> > > iommu_device_sysfs_remove(&iommu->iommu);
> > > riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
> > > + riscv_iommu_queue_free(iommu, &iommu->cmdq);
> > > + riscv_iommu_queue_free(iommu, &iommu->fltq);
> > > + riscv_iommu_queue_free(iommu, &iommu->priq);
> > > }
> > >
> > > int riscv_iommu_init(struct riscv_iommu_device *iommu)
> > > @@ -632,6 +1208,16 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
> > > }
> > > #endif
> > >
> > > + /*
> > > + * Assign queue lengths from module parameters if not already
> > > + * set on the device tree.
> > > + */
> > > + if (!iommu->cmdq_len)
> > > + iommu->cmdq_len = cmdq_length;
> > > + if (!iommu->fltq_len)
> > > + iommu->fltq_len = fltq_length;
> > > + if (!iommu->priq_len)
> > > + iommu->priq_len = priq_length;
> > > /* Clear any pending interrupt flag. */
> > > riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR,
> > > RISCV_IOMMU_IPSR_CIP |
> > > @@ -639,7 +1225,20 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
> > > RISCV_IOMMU_IPSR_PMIP | RISCV_IOMMU_IPSR_PIP);
> > > spin_lock_init(&iommu->cq_lock);
> > > mutex_init(&iommu->eps_mutex);
> > > + ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_COMMAND_QUEUE);
> > > + if (ret)
> > > + goto fail;
> > > + ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_FAULT_QUEUE);
> > > + if (ret)
> > > + goto fail;
> > > + if (!(iommu->cap & RISCV_IOMMU_CAP_ATS))
> > > + goto no_ats;
> > > +
> > > + ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_PAGE_REQUEST_QUEUE);
> > > + if (ret)
> > > + goto fail;
> > >
> > > + no_ats:
> > > ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);
> > >
> > > if (ret) {
> > > @@ -663,5 +1262,8 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
> > > return 0;
> > > fail:
> > > riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
> > > + riscv_iommu_queue_free(iommu, &iommu->priq);
> > > + riscv_iommu_queue_free(iommu, &iommu->fltq);
> > > + riscv_iommu_queue_free(iommu, &iommu->cmdq);
> > > return ret;
> > > }
> > > diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> > > index 7dc9baa59a50..04148a2a8ffd 100644
> > > --- a/drivers/iommu/riscv/iommu.h
> > > +++ b/drivers/iommu/riscv/iommu.h
> > > @@ -28,6 +28,24 @@
> > > #define IOMMU_PAGE_SIZE_1G BIT_ULL(30)
> > > #define IOMMU_PAGE_SIZE_512G BIT_ULL(39)
> > >
> > > +struct riscv_iommu_queue {
> > > + dma_addr_t base_dma; /* ring buffer bus address */
> > > + void *base; /* ring buffer pointer */
> > > + size_t len; /* single item length */
> > > + u32 cnt; /* items count */
> > > + u32 lui; /* last used index, consumer/producer share */
> > > + unsigned qbr; /* queue base register offset */
> > > + unsigned qcr; /* queue control and status register offset */
> > > + int irq; /* registered interrupt number */
> > > + bool in_iomem; /* indicates queue data are in I/O memory */
> > > +};
> > > +
> > > +enum riscv_queue_ids {
> > > + RISCV_IOMMU_COMMAND_QUEUE = 0,
> > > + RISCV_IOMMU_FAULT_QUEUE = 1,
> > > + RISCV_IOMMU_PAGE_REQUEST_QUEUE = 2
> > > +};
> > > +
> > > struct riscv_iommu_device {
> > > struct iommu_device iommu; /* iommu core interface */
> > > struct device *dev; /* iommu hardware */
> > > @@ -42,6 +60,11 @@ struct riscv_iommu_device {
> > > int irq_pm;
> > > int irq_priq;
> > >
> > > + /* Queue lengths */
> > > + int cmdq_len;
> > > + int fltq_len;
> > > + int priq_len;
> > > +
> > > /* supported and enabled hardware capabilities */
> > > u64 cap;
> > >
> > > @@ -53,6 +76,11 @@ struct riscv_iommu_device {
> > > unsigned ddt_mode;
> > > bool ddtp_in_iomem;
> > >
> > > + /* hardware queues */
> > > + struct riscv_iommu_queue cmdq;
> > > + struct riscv_iommu_queue fltq;
> > > + struct riscv_iommu_queue priq;
> > > +
> > > /* Connected end-points */
> > > struct rb_root eps;
> > > struct mutex eps_mutex;
> > > --
> > > 2.34.1
> > >
> > >
> > > _______________________________________________
> > > linux-riscv mailing list
> > > [email protected]
> > > http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> best,
> - Tomasz

2023-08-09 16:19:36

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings

On Thu, Jul 27, 2023 at 10:42:47AM +0800, Zong Li wrote:

> Perhaps this question could be related to the scenarios in which
> devices wish to be in bypass mode when the IOMMU is in translation
> mode, and why IOMMU defines/supports this case. Currently, I could
> envision a scenario where a device is already connected to the IOMMU
> in hardware, but it is not functioning correctly, or there are
> performance impacts. If modifying the hardware is not feasible, a
> default configuration that allows bypass mode could be provided as a
> solution. There might be other scenarios that I might have overlooked.
> It seems to me since IOMMU supports this configuration, it would be
> advantageous to have an approach to achieve it, and DT might be a
> flexible way.

So far we've taken the approach that broken hardware is quirked in the
kernel by matching OF compatible string pattners. This is HW that is
completely broken and the IOMMU doesn't work at all for it.

HW that is slow or whatever is not quirked and this is an admin policy
choice where the system should land on the security/performance
spectrum.

So I'm not sure adding DT makes sense here.

Jason

2023-08-17 08:03:32

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 07/11] RISC-V: drivers/iommu/riscv: Add device context support

On 2023-07-19 20:33, Tomasz Jeznach wrote:
> Introduces per device translation context, with 1,2 or 3 tree level
> device tree structures.
>
> Signed-off-by: Tomasz Jeznach <[email protected]>
> ---
> drivers/iommu/riscv/iommu.c | 163 ++++++++++++++++++++++++++++++++++--
> drivers/iommu/riscv/iommu.h | 1 +
> 2 files changed, 158 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 5c4cf9875302..9ee7d2b222b5 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -41,7 +41,7 @@ MODULE_ALIAS("riscv-iommu");
> MODULE_LICENSE("GPL v2");
>
> /* Global IOMMU params. */
> -static int ddt_mode = RISCV_IOMMU_DDTP_MODE_BARE;
> +static int ddt_mode = RISCV_IOMMU_DDTP_MODE_3LVL;
> module_param(ddt_mode, int, 0644);
> MODULE_PARM_DESC(ddt_mode, "Device Directory Table mode.");
>
> @@ -452,6 +452,14 @@ static bool riscv_iommu_post(struct riscv_iommu_device *iommu,
> return riscv_iommu_post_sync(iommu, cmd, false);
> }
>
> +static bool riscv_iommu_iodir_inv_devid(struct riscv_iommu_device *iommu, unsigned devid)
> +{
> + struct riscv_iommu_command cmd;
> + riscv_iommu_cmd_iodir_inval_ddt(&cmd);
> + riscv_iommu_cmd_iodir_set_did(&cmd, devid);
> + return riscv_iommu_post(iommu, &cmd);
> +}
> +
> static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
> {
> struct riscv_iommu_command cmd;
> @@ -671,6 +679,94 @@ static bool riscv_iommu_capable(struct device *dev, enum iommu_cap cap)
> return false;
> }
>
> +/* TODO: implement proper device context management, e.g. teardown flow */
> +
> +/* Lookup or initialize device directory info structure. */
> +static struct riscv_iommu_dc *riscv_iommu_get_dc(struct riscv_iommu_device *iommu,
> + unsigned devid)
> +{
> + const bool base_format = !(iommu->cap & RISCV_IOMMU_CAP_MSI_FLAT);
> + unsigned depth = iommu->ddt_mode - RISCV_IOMMU_DDTP_MODE_1LVL;
> + u8 ddi_bits[3] = { 0 };
> + u64 *ddtp = NULL, ddt;
> +
> + if (iommu->ddt_mode == RISCV_IOMMU_DDTP_MODE_OFF ||
> + iommu->ddt_mode == RISCV_IOMMU_DDTP_MODE_BARE)
> + return NULL;

I don't see how the driver can ever be useful without a DDT - I'd have
thought that you only ever want to use one of those modes on probe
failure or remove.

> +
> + /* Make sure the mode is valid */
> + if (iommu->ddt_mode > RISCV_IOMMU_DDTP_MODE_MAX)
> + return NULL;
> +
> + /*
> + * Device id partitioning for base format:
> + * DDI[0]: bits 0 - 6 (1st level) (7 bits)
> + * DDI[1]: bits 7 - 15 (2nd level) (9 bits)
> + * DDI[2]: bits 16 - 23 (3rd level) (8 bits)
> + *
> + * For extended format:
> + * DDI[0]: bits 0 - 5 (1st level) (6 bits)
> + * DDI[1]: bits 6 - 14 (2nd level) (9 bits)
> + * DDI[2]: bits 15 - 23 (3rd level) (9 bits)
> + */
> + if (base_format) {
> + ddi_bits[0] = 7;
> + ddi_bits[1] = 7 + 9;
> + ddi_bits[2] = 7 + 9 + 8;
> + } else {
> + ddi_bits[0] = 6;
> + ddi_bits[1] = 6 + 9;
> + ddi_bits[2] = 6 + 9 + 9;
> + }
> +
> + /* Make sure device id is within range */
> + if (devid >= (1 << ddi_bits[depth]))
> + return NULL;
> +
> + /* Get to the level of the non-leaf node that holds the device context */
> + for (ddtp = (u64 *) iommu->ddtp; depth-- > 0;) {
> + const int split = ddi_bits[depth];
> + /*
> + * Each non-leaf node is 64bits wide and on each level
> + * nodes are indexed by DDI[depth].
> + */
> + ddtp += (devid >> split) & 0x1FF;
> +
> + retry:
> + /*
> + * Check if this node has been populated and if not
> + * allocate a new level and populate it.
> + */
> + ddt = READ_ONCE(*ddtp);
> + if (ddt & RISCV_IOMMU_DDTE_VALID) {
> + ddtp = __va(ppn_to_phys(ddt));
> + } else {
> + u64 old, new = get_zeroed_page(GFP_KERNEL);
> + if (!new)
> + return NULL;
> +
> + old = cmpxchg64_relaxed(ddtp, ddt,
> + phys_to_ppn(__pa(new)) |
> + RISCV_IOMMU_DDTE_VALID);
> +
> + if (old != ddt) {
> + free_page(new);
> + goto retry;
> + }
> +
> + ddtp = (u64 *) new;
> + }
> + }
> +
> + /*
> + * Grab the node that matches DDI[depth], note that when using base
> + * format the device context is 4 * 64bits, and the extended format
> + * is 8 * 64bits, hence the (3 - base_format) below.
> + */
> + ddtp += (devid & ((64 << base_format) - 1)) << (3 - base_format);
> + return (struct riscv_iommu_dc *)ddtp;
> +}
> +
> static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
> {
> struct riscv_iommu_device *iommu;
> @@ -708,6 +804,9 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
> ep->iommu = iommu;
> ep->dev = dev;
>
> + /* Initial DC pointer can be NULL if IOMMU is configured in OFF or BARE mode */
> + ep->dc = riscv_iommu_get_dc(iommu, ep->devid);
> +
> dev_info(iommu->dev, "adding device to iommu with devid %i in domain %i\n",
> ep->devid, ep->domid);
>
> @@ -734,6 +833,16 @@ static void riscv_iommu_release_device(struct device *dev)
> list_del(&ep->domain);
> mutex_unlock(&ep->lock);
>
> + if (ep->dc) {
> + // this should be already done by domain detach.

What's domain detach? ;)

> + ep->dc->tc = 0ULL;
> + wmb();
> + ep->dc->fsc = 0ULL;
> + ep->dc->iohgatp = 0ULL;
> + wmb();
> + riscv_iommu_iodir_inv_devid(iommu, ep->devid);
> + }
> +
> /* Remove endpoint from IOMMU tracking structures */
> mutex_lock(&iommu->eps_mutex);
> rb_erase(&ep->node, &iommu->eps);
> @@ -853,11 +962,21 @@ static int riscv_iommu_domain_finalize(struct riscv_iommu_domain *domain,
> return 0;
> }
>
> +static u64 riscv_iommu_domain_atp(struct riscv_iommu_domain *domain)
> +{
> + u64 atp = FIELD_PREP(RISCV_IOMMU_DC_FSC_MODE, domain->mode);
> + if (domain->mode != RISCV_IOMMU_DC_FSC_MODE_BARE)
> + atp |= FIELD_PREP(RISCV_IOMMU_DC_FSC_PPN, virt_to_pfn(domain->pgd_root));
> + return atp;
> +}
> +
> static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct device *dev)
> {
> struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> + struct riscv_iommu_dc *dc = ep->dc;
> int ret;
> + u64 val;
>
> /* PSCID not valid */
> if ((int)domain->pscid < 0)
> @@ -880,17 +999,44 @@ static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct devi
> return ret;
> }
>
> - if (ep->iommu->ddt_mode != RISCV_IOMMU_DDTP_MODE_BARE ||
> - domain->domain.type != IOMMU_DOMAIN_IDENTITY) {
> - dev_warn(dev, "domain type %d not supported\n",
> - domain->domain.type);
> + if (ep->iommu->ddt_mode == RISCV_IOMMU_DDTP_MODE_BARE &&
> + domain->domain.type == IOMMU_DOMAIN_IDENTITY) {
> + dev_info(dev, "domain type %d attached w/ PSCID %u\n",
> + domain->domain.type, domain->pscid);
> + return 0;
> + }
> +
> + if (!dc) {
> return -ENODEV;
> }
>
> + /*
> + * S-Stage translation table. G-Stage remains unmodified (BARE).
> + */
> + val = FIELD_PREP(RISCV_IOMMU_DC_TA_PSCID, domain->pscid);
> +
> + dc->ta = cpu_to_le64(val);
> + dc->fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
> +
> + wmb();
> +
> + /* Mark device context as valid, synchronise device context cache. */
> + val = RISCV_IOMMU_DC_TC_V;
> +
> + if (ep->iommu->cap & RISCV_IOMMU_CAP_AMO) {
> + val |= RISCV_IOMMU_DC_TC_GADE |
> + RISCV_IOMMU_DC_TC_SADE;
> + }
> +
> + dc->tc = cpu_to_le64(val);
> + wmb();
> +
> list_add_tail(&ep->domain, &domain->endpoints);
> mutex_unlock(&ep->lock);
> mutex_unlock(&domain->lock);
>
> + riscv_iommu_iodir_inv_devid(ep->iommu, ep->devid);
> +
> dev_info(dev, "domain type %d attached w/ PSCID %u\n",
> domain->domain.type, domain->pscid);
>
> @@ -1239,7 +1385,12 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
> goto fail;
>
> no_ats:
> - ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);
> + if (iommu_default_passthrough()) {
> + dev_info(dev, "iommu set to passthrough mode\n");
> + ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);

Yeah, disabling the whole IOMMU is not what default passthrough means...
drivers should not care about that at all, it only affects the core
code's choice of default domain type. Even if that is identity,
translation absolutely still needs to be available on a per-device
basis, for unmanaged domains or default domain changes via sysfs.

Thanks,
Robin.

> + } else {
> + ret = riscv_iommu_enable(iommu, ddt_mode);
> + }
>
> if (ret) {
> dev_err(dev, "cannot enable iommu device (%d)\n", ret);
> diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> index 04148a2a8ffd..9140df71e17b 100644
> --- a/drivers/iommu/riscv/iommu.h
> +++ b/drivers/iommu/riscv/iommu.h
> @@ -105,6 +105,7 @@ struct riscv_iommu_endpoint {
> unsigned devid; /* PCI bus:device:function number */
> unsigned domid; /* PCI domain number, segment */
> struct rb_node node; /* device tracking node (lookup by devid) */
> + struct riscv_iommu_dc *dc; /* device context pointer */
> struct riscv_iommu_device *iommu; /* parent iommu device */
>
> struct mutex lock;

2023-08-19 08:25:24

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings

On 2023/8/16 10:16, Zong Li wrote:
> On Wed, Aug 16, 2023 at 2:38 AM Jason Gunthorpe<[email protected]> wrote:
>> On Tue, Aug 15, 2023 at 09:28:54AM +0800, Zong Li wrote:
>>> On Wed, Aug 9, 2023 at 10:57 PM Jason Gunthorpe<[email protected]> wrote:
>>>> On Thu, Jul 27, 2023 at 10:42:47AM +0800, Zong Li wrote:
>>>>
>>>>> Perhaps this question could be related to the scenarios in which
>>>>> devices wish to be in bypass mode when the IOMMU is in translation
>>>>> mode, and why IOMMU defines/supports this case. Currently, I could
>>>>> envision a scenario where a device is already connected to the IOMMU
>>>>> in hardware, but it is not functioning correctly, or there are
>>>>> performance impacts. If modifying the hardware is not feasible, a
>>>>> default configuration that allows bypass mode could be provided as a
>>>>> solution. There might be other scenarios that I might have overlooked.
>>>>> It seems to me since IOMMU supports this configuration, it would be
>>>>> advantageous to have an approach to achieve it, and DT might be a
>>>>> flexible way.
>>>> So far we've taken the approach that broken hardware is quirked in the
>>>> kernel by matching OF compatible string pattners. This is HW that is
>>>> completely broken and the IOMMU doesn't work at all for it.
>>>>
>>>> HW that is slow or whatever is not quirked and this is an admin policy
>>>> choice where the system should land on the security/performance
>>>> spectrum.
>>>>
>>>> So I'm not sure adding DT makes sense here.
>>>>
>>> Hi Jason,
>>> Sorry for being late here, I hadn't noticed this reply earlier. The
>>> approach seems to address the situation. Could you kindly provide
>>> information about the location of the patches? I was wondering about
>>> further details regarding this particular implementation. Thanks
>> There are a couple versions, eg
>> arm_smmu_def_domain_type()
>> qcom_smmu_def_domain_type()
>>
> I thought what you mentioned earlier is that there is a new approach
> being considered for this. I think what you point out is the same as
> Anup mentioned. However, as I mentioned earlier, I am exploring a more
> flexible approach to achieve this objective. This way, we can avoid
> hard coding anything (i.e.list compatible string) in the driver or
> requiring a kernel rebuild every time we need to change the mode for
> specific devices. For example, the driver could parse the device node
> to determine and record if a device will be set to bypass, and then
> the .def_domain_type could be used to set to IOMMU_DOMAIN_IDENTITY by
> the record. I'm not sure if it makes sense for everyone, it seems to
> me that it would be great if there is a way to do this. ????

What you described applies to the case where the device is *quirky*, it
"is not functioning correctly" when the IOMMU is configured in DMA
translation mode.

But it could not be used in another case, as described above, where
IOMMU translation has performance impacts on the device's DMA
efficiency. This is a kind of a user policy and should not be achieved
through the "DT/APCI + def_domain_type" mechanism.

The iommu subsystem has provided a sysfs interface that users can use to
change the domain type for devices. This means that users can change the
domain type at their wishes, without having to modify the kernel
configuration.

Best regards,
baolu


2023-08-19 15:17:48

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings

On Tue, Aug 15, 2023 at 09:28:54AM +0800, Zong Li wrote:
> On Wed, Aug 9, 2023 at 10:57 PM Jason Gunthorpe <[email protected]> wrote:
> >
> > On Thu, Jul 27, 2023 at 10:42:47AM +0800, Zong Li wrote:
> >
> > > Perhaps this question could be related to the scenarios in which
> > > devices wish to be in bypass mode when the IOMMU is in translation
> > > mode, and why IOMMU defines/supports this case. Currently, I could
> > > envision a scenario where a device is already connected to the IOMMU
> > > in hardware, but it is not functioning correctly, or there are
> > > performance impacts. If modifying the hardware is not feasible, a
> > > default configuration that allows bypass mode could be provided as a
> > > solution. There might be other scenarios that I might have overlooked.
> > > It seems to me since IOMMU supports this configuration, it would be
> > > advantageous to have an approach to achieve it, and DT might be a
> > > flexible way.
> >
> > So far we've taken the approach that broken hardware is quirked in the
> > kernel by matching OF compatible string pattners. This is HW that is
> > completely broken and the IOMMU doesn't work at all for it.
> >
> > HW that is slow or whatever is not quirked and this is an admin policy
> > choice where the system should land on the security/performance
> > spectrum.
> >
> > So I'm not sure adding DT makes sense here.
> >
>
> Hi Jason,
> Sorry for being late here, I hadn't noticed this reply earlier. The
> approach seems to address the situation. Could you kindly provide
> information about the location of the patches? I was wondering about
> further details regarding this particular implementation. Thanks

There are a couple versions, eg
arm_smmu_def_domain_type()
qcom_smmu_def_domain_type()

Jason

2024-02-23 14:05:37

by Zong Li

[permalink] [raw]
Subject: Re: [PATCH 00/13] Linux RISC-V IOMMU Support

>
> The RISC-V IOMMU specification is now ratified as-per the RISC-V international
> process [1]. The latest frozen specifcation can be found at:
> https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
>
> At a high-level, the RISC-V IOMMU specification defines:
> 1) Memory-mapped programming interface
> - Mandatory and optional registers layout and description.
> - Software guidelines for device initialization and capabilities discovery.
> 2) In-memory queue interface
> - A command-queue used by software to queue commands to the IOMMU.
> - A fault/event queue used to bring faults and events to software’s attention.
> - A page-request queue used to report “Page Request” messages received from
> PCIe devices.
> - Message-signalled and wire-signaled interrupt mechanism.
> 3) In-memory data structures
> - Device-context: used to associate a device with an address space and to hold
> other per-device parameters used by the IOMMU to perform address translations.
> - Process-contexts: used to associate a different virtual address space based on
> device provided process identification number.
> - MSI page table configuration used to direct an MSI to a guest interrupt file
> in an IMSIC. The MSI page table formats are defined by the Advanced Interrupt
> Architecture specification [2].
>
> This series introduces complete single-level translation support, including shared
> virtual address (SVA), ATS/PRI interfaces in the kernel driver. Patches adding MSI
> identity remapping and G-Stage translation (GPA to SPA) are added only to excercise
> hardware interfaces, to be complemented with AIA/KVM bindings in follow-up series.
>
> This series is a logical regrouping of series of incremental patches based on
> RISC-V International IOMMU Task Group discussions and specification development
> process. Original series can be found at the maintainer's repository branch [3].
>
> These patches can also be found in the riscv_iommu_v1 branch at:
> https://github.com/tjeznach/linux/tree/riscv_iommu_v1
>
> To test this series, use QEMU/OpenSBI with RISC-V IOMMU implementation available in
> the riscv_iommu_v1 branch at:
> https://github.com/tjeznach/qemu/tree/riscv_iommu_v1
>
> References:
> [1] - https://wiki.riscv.org/display/HOME/Specification+Status
> [2] - https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> [3] - https://github.com/tjeznach/qemu/tree/tjeznach/riscv-iommu-20230719
>
>
> Anup Patel (1):
> dt-bindings: Add RISC-V IOMMU bindings
>
> Tomasz Jeznach (10):
> RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
> RISC-V: arch/riscv/config: enable RISC-V IOMMU support
> MAINTAINERS: Add myself for RISC-V IOMMU driver
> RISC-V: drivers/iommu/riscv: Add sysfs interface
> RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
> RISC-V: drivers/iommu/riscv: Add device context support
> RISC-V: drivers/iommu/riscv: Add page table support
> RISC-V: drivers/iommu/riscv: Add SVA with PASID/ATS/PRI support.
> RISC-V: drivers/iommu/riscv: Add MSI identity remapping
> RISC-V: drivers/iommu/riscv: Add G-Stage translation support
>
> .../bindings/iommu/riscv,iommu.yaml | 146 ++
> MAINTAINERS | 7 +
> arch/riscv/configs/defconfig | 1 +
> drivers/iommu/Kconfig | 1 +
> drivers/iommu/Makefile | 2 +-
> drivers/iommu/io-pgtable.c | 3 +
> drivers/iommu/riscv/Kconfig | 22 +
> drivers/iommu/riscv/Makefile | 1 +
> drivers/iommu/riscv/io_pgtable.c | 266 ++
> drivers/iommu/riscv/iommu-bits.h | 704 ++++++
> drivers/iommu/riscv/iommu-pci.c | 206 ++
> drivers/iommu/riscv/iommu-platform.c | 160 ++
> drivers/iommu/riscv/iommu-sysfs.c | 183 ++
> drivers/iommu/riscv/iommu.c | 2130 +++++++++++++++++
> drivers/iommu/riscv/iommu.h | 165 ++
> include/linux/io-pgtable.h | 2 +
> 16 files changed, 3998 insertions(+), 1 deletion(-)
> create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> create mode 100644 drivers/iommu/riscv/Kconfig
> create mode 100644 drivers/iommu/riscv/Makefile
> create mode 100644 drivers/iommu/riscv/io_pgtable.c
> create mode 100644 drivers/iommu/riscv/iommu-bits.h
> create mode 100644 drivers/iommu/riscv/iommu-pci.c
> create mode 100644 drivers/iommu/riscv/iommu-platform.c
> create mode 100644 drivers/iommu/riscv/iommu-sysfs.c
> create mode 100644 drivers/iommu/riscv/iommu.c
> create mode 100644 drivers/iommu/riscv/iommu.h
>
> --
> 2.34.1
>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

Hi Tomasz,
Could I know if you have a plan for the next version and if you have
any estimates for when the v2 patch will be ready? We have some
patches based on top of your old implementation, and it would be great
if we can rebase them onto your next version. Thanks.

2024-04-04 17:37:54

by Tomasz Jeznach

[permalink] [raw]
Subject: Re: [PATCH 00/13] Linux RISC-V IOMMU Support

On Fri, Feb 23, 2024 at 6:04 AM Zong Li <[email protected]> wrote:
>
> >
> > The RISC-V IOMMU specification is now ratified as-per the RISC-V international
> > process [1]. The latest frozen specifcation can be found at:
> > https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
> >
> > At a high-level, the RISC-V IOMMU specification defines:
> > 1) Memory-mapped programming interface
> > - Mandatory and optional registers layout and description.
> > - Software guidelines for device initialization and capabilities discovery.
> > 2) In-memory queue interface
> > - A command-queue used by software to queue commands to the IOMMU.
> > - A fault/event queue used to bring faults and events to software’s attention.
> > - A page-request queue used to report “Page Request” messages received from
> > PCIe devices.
> > - Message-signalled and wire-signaled interrupt mechanism.
> > 3) In-memory data structures
> > - Device-context: used to associate a device with an address space and to hold
> > other per-device parameters used by the IOMMU to perform address translations.
> > - Process-contexts: used to associate a different virtual address space based on
> > device provided process identification number.
> > - MSI page table configuration used to direct an MSI to a guest interrupt file
> > in an IMSIC. The MSI page table formats are defined by the Advanced Interrupt
> > Architecture specification [2].
> >
> > This series introduces complete single-level translation support, including shared
> > virtual address (SVA), ATS/PRI interfaces in the kernel driver. Patches adding MSI
> > identity remapping and G-Stage translation (GPA to SPA) are added only to excercise
> > hardware interfaces, to be complemented with AIA/KVM bindings in follow-up series.
> >
> > This series is a logical regrouping of series of incremental patches based on
> > RISC-V International IOMMU Task Group discussions and specification development
> > process. Original series can be found at the maintainer's repository branch [3].
> >
> > These patches can also be found in the riscv_iommu_v1 branch at:
> > https://github.com/tjeznach/linux/tree/riscv_iommu_v1
> >
> > To test this series, use QEMU/OpenSBI with RISC-V IOMMU implementation available in
> > the riscv_iommu_v1 branch at:
> > https://github.com/tjeznach/qemu/tree/riscv_iommu_v1
> >
> > References:
> > [1] - https://wiki.riscv.org/display/HOME/Specification+Status
> > [2] - https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> > [3] - https://github.com/tjeznach/qemu/tree/tjeznach/riscv-iommu-20230719
> >
> >
> > Anup Patel (1):
> > dt-bindings: Add RISC-V IOMMU bindings
> >
> > Tomasz Jeznach (10):
> > RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
> > RISC-V: arch/riscv/config: enable RISC-V IOMMU support
> > MAINTAINERS: Add myself for RISC-V IOMMU driver
> > RISC-V: drivers/iommu/riscv: Add sysfs interface
> > RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
> > RISC-V: drivers/iommu/riscv: Add device context support
> > RISC-V: drivers/iommu/riscv: Add page table support
> > RISC-V: drivers/iommu/riscv: Add SVA with PASID/ATS/PRI support.
> > RISC-V: drivers/iommu/riscv: Add MSI identity remapping
> > RISC-V: drivers/iommu/riscv: Add G-Stage translation support
> >
> > .../bindings/iommu/riscv,iommu.yaml | 146 ++
> > MAINTAINERS | 7 +
> > arch/riscv/configs/defconfig | 1 +
> > drivers/iommu/Kconfig | 1 +
> > drivers/iommu/Makefile | 2 +-
> > drivers/iommu/io-pgtable.c | 3 +
> > drivers/iommu/riscv/Kconfig | 22 +
> > drivers/iommu/riscv/Makefile | 1 +
> > drivers/iommu/riscv/io_pgtable.c | 266 ++
> > drivers/iommu/riscv/iommu-bits.h | 704 ++++++
> > drivers/iommu/riscv/iommu-pci.c | 206 ++
> > drivers/iommu/riscv/iommu-platform.c | 160 ++
> > drivers/iommu/riscv/iommu-sysfs.c | 183 ++
> > drivers/iommu/riscv/iommu.c | 2130 +++++++++++++++++
> > drivers/iommu/riscv/iommu.h | 165 ++
> > include/linux/io-pgtable.h | 2 +
> > 16 files changed, 3998 insertions(+), 1 deletion(-)
> > create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommuyaml
> > create mode 100644 drivers/iommu/riscv/Kconfig
> > create mode 100644 drivers/iommu/riscv/Makefile
> > create mode 100644 drivers/iommu/riscv/io_pgtable.c
> > create mode 100644 drivers/iommu/riscv/iommu-bits.h
> > create mode 100644 drivers/iommu/riscv/iommu-pci.c
> > create mode 100644 drivers/iommu/riscv/iommu-platform.c
> > create mode 100644 drivers/iommu/riscv/iommu-sysfs.c
> > create mode 100644 drivers/iommu/riscv/iommu.c
> > create mode 100644 drivers/iommu/riscv/iommu.h
> >
> > --
> > 2.34.1
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> Hi Tomasz,
> Could I know if you have a plan for the next version and if you have
> any estimates for when the v2 patch will be ready? We have some
> patches based on top of your old implementation, and it would be great
> if we can rebase them onto your next version. Thanks.

Hi Zong,

Thank you for your interest. Next version of the iommu/riscv is almost ready to
be sent in next few days.
There is a number of bug fixes and design changes based on the testing and
great feedback after v1 was published.
Upcoming patch set will be smaller, with core functionality only, hopefully to
make the review easier. Functionality related to the MSI remapping, shared
virtual addressing, nested translations will be moved to separate patch sets.

Complete, up to date revision is always available at
https://github.com/tjeznach/linux/

regards,
- Tomasz

2024-04-10 05:38:39

by Zong Li

[permalink] [raw]
Subject: Re: [PATCH 00/13] Linux RISC-V IOMMU Support

On Fri, Apr 5, 2024 at 1:37 AM Tomasz Jeznach <[email protected]> wrote:
>
> On Fri, Feb 23, 2024 at 6:04 AM Zong Li <[email protected]> wrote:
> >
> > >
> > > The RISC-V IOMMU specification is now ratified as-per the RISC-V international
> > > process [1]. The latest frozen specifcation can be found at:
> > > https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
> > >
> > > At a high-level, the RISC-V IOMMU specification defines:
> > > 1) Memory-mapped programming interface
> > > - Mandatory and optional registers layout and description.
> > > - Software guidelines for device initialization and capabilities discovery.
> > > 2) In-memory queue interface
> > > - A command-queue used by software to queue commands to the IOMMU.
> > > - A fault/event queue used to bring faults and events to software’s attention.
> > > - A page-request queue used to report “Page Request” messages received from
> > > PCIe devices.
> > > - Message-signalled and wire-signaled interrupt mechanism.
> > > 3) In-memory data structures
> > > - Device-context: used to associate a device with an address space and to hold
> > > other per-device parameters used by the IOMMU to perform address translations.
> > > - Process-contexts: used to associate a different virtual address space based on
> > > device provided process identification number.
> > > - MSI page table configuration used to direct an MSI to a guest interrupt file
> > > in an IMSIC. The MSI page table formats are defined by the Advanced Interrupt
> > > Architecture specification [2].
> > >
> > > This series introduces complete single-level translation support, including shared
> > > virtual address (SVA), ATS/PRI interfaces in the kernel driver. Patches adding MSI
> > > identity remapping and G-Stage translation (GPA to SPA) are added only to excercise
> > > hardware interfaces, to be complemented with AIA/KVM bindings in follow-up series.
> > >
> > > This series is a logical regrouping of series of incremental patches based on
> > > RISC-V International IOMMU Task Group discussions and specification development
> > > process. Original series can be found at the maintainer's repository branch [3].
> > >
> > > These patches can also be found in the riscv_iommu_v1 branch at:
> > > https://github.com/tjeznach/linux/tree/riscv_iommu_v1
> > >
> > > To test this series, use QEMU/OpenSBI with RISC-V IOMMU implementation available in
> > > the riscv_iommu_v1 branch at:
> > > https://github.com/tjeznach/qemu/tree/riscv_iommu_v1
> > >
> > > References:
> > > [1] - https://wiki.riscv.org/display/HOME/Specification+Status
> > > [2] - https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> > > [3] - https://github.com/tjeznach/qemu/tree/tjeznach/riscv-iommu-20230719
> > >
> > >
> > > Anup Patel (1):
> > > dt-bindings: Add RISC-V IOMMU bindings
> > >
> > > Tomasz Jeznach (10):
> > > RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
> > > RISC-V: arch/riscv/config: enable RISC-V IOMMU support
> > > MAINTAINERS: Add myself for RISC-V IOMMU driver
> > > RISC-V: drivers/iommu/riscv: Add sysfs interface
> > > RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
> > > RISC-V: drivers/iommu/riscv: Add device context support
> > > RISC-V: drivers/iommu/riscv: Add page table support
> > > RISC-V: drivers/iommu/riscv: Add SVA with PASID/ATS/PRI support.
> > > RISC-V: drivers/iommu/riscv: Add MSI identity remapping
> > > RISC-V: drivers/iommu/riscv: Add G-Stage translation support
> > >
> > > .../bindings/iommu/riscv,iommu.yaml | 146 ++
> > > MAINTAINERS | 7 +
> > > arch/riscv/configs/defconfig | 1 +
> > > drivers/iommu/Kconfig | 1 +
> > > drivers/iommu/Makefile | 2 +-
> > > drivers/iommu/io-pgtable.c | 3 +
> > > drivers/iommu/riscv/Kconfig | 22 +
> > > drivers/iommu/riscv/Makefile | 1 +
> > > drivers/iommu/riscv/io_pgtable.c | 266 ++
> > > drivers/iommu/riscv/iommu-bits.h | 704 ++++++
> > > drivers/iommu/riscv/iommu-pci.c | 206 ++
> > > drivers/iommu/riscv/iommu-platform.c | 160 ++
> > > drivers/iommu/riscv/iommu-sysfs.c | 183 ++
> > > drivers/iommu/riscv/iommu.c | 2130 +++++++++++++++++
> > > drivers/iommu/riscv/iommu.h | 165 ++
> > > include/linux/io-pgtable.h | 2 +
> > > 16 files changed, 3998 insertions(+), 1 deletion(-)
> > > create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > create mode 100644 drivers/iommu/riscv/Kconfig
> > > create mode 100644 drivers/iommu/riscv/Makefile
> > > create mode 100644 drivers/iommu/riscv/io_pgtable.c
> > > create mode 100644 drivers/iommu/riscv/iommu-bits.h
> > > create mode 100644 drivers/iommu/riscv/iommu-pci.c
> > > create mode 100644 drivers/iommu/riscv/iommu-platform.c
> > > create mode 100644 drivers/iommu/riscv/iommu-sysfs.c
> > > create mode 100644 drivers/iommu/riscv/iommu.c
> > > create mode 100644 drivers/iommu/riscv/iommu.h
> > >
> > > --
> > > 2.34.1
> > >
> > >
> > > _______________________________________________
> > > linux-riscv mailing list
> > > [email protected]
> > > http://lists.infradead.org/mailman/listinfo/linux-riscv
> >
> > Hi Tomasz,
> > Could I know if you have a plan for the next version and if you have
> > any estimates for when the v2 patch will be ready? We have some
> > patches based on top of your old implementation, and it would be great
> > if we can rebase them onto your next version. Thanks.
>
> Hi Zong,
>
> Thank you for your interest. Next version of the iommu/riscv is almost ready to
> be sent in next few days.

Hi Tomasz,
Thanks you for the update, I would help to review the v2 series as well.

> There is a number of bug fixes and design changes based on the testing and
> great feedback after v1 was published.
> Upcoming patch set will be smaller, with core functionality only, hopefully to
> make the review easier. Functionality related to the MSI remapping, shared
> virtual addressing, nested translations will be moved to separate patch sets.
>
> Complete, up to date revision is always available at
> https://github.com/tjeznach/linux/
>
> regards,
> - Tomasz