The RISC-V AIA specification is ratified as-per the RISC-V international
process. The latest ratified AIA specifcation can be found at:
https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
At a high-level, the AIA specification adds three things:
1) AIA CSRs
- Improved local interrupt support
2) Incoming Message Signaled Interrupt Controller (IMSIC)
- Per-HART MSI controller
- Support MSI virtualization
- Support IPI along with virtualization
3) Advanced Platform-Level Interrupt Controller (APLIC)
- Wired interrupt controller
- In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
- In Direct-mode, injects external interrupts directly into HARTs
For an overview of the AIA specification, refer the AIA virtualization
talk at KVM Forum 2022:
https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
https://www.youtube.com/watch?v=r071dL8Z0yo
To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
These patches can also be found in the riscv_aia_v12 branch at:
https://github.com/avpatel/linux.git
Changes since v11:
- Rebased on Linux-6.8-rc1
- Included kernel/irq related patches from "genirq, irqchip: Convert ARM
MSI handling to per device MSI domains" series by Thomas.
(PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
https://lore.kernel.org/linux-arm-kernel/[email protected]/)
- Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
- Updated IMSIC driver to support per-device MSI domains for PCI and
platform devices.
Changes since v10:
- Rebased on Linux-6.6-rc7
- Dropped PATCH3 of v10 series since this has been merged by MarcZ
for Linux-6.6-rc7
- Changed the IMSIC ID management strategy from 1-n approach to
x86-style 1-1 approach
Changes since v9:
- Rebased on Linux-6.6-rc4
- Use builtin_platform_driver() in PATCH5, PATCH9, and PATCH12
Changes since v8:
- Rebased on Linux-6.6-rc3
- Dropped PATCH2 of v8 series since we won't be requiring
riscv_get_intc_hartid() based on Marc Z's comments on ACPI AIA support.
- Addressed Saravana's comments in PATCH3 of v8 series
- Update PATCH9 and PATCH13 of v8 series based on comments from Sunil
Changes since v7:
- Rebased on Linux-6.6-rc1
- Addressed comments on PATCH1 of v7 series and split it into two PATCHes
- Use DEFINE_SIMPLE_PROP() in PATCH2 of v7 series
Changes since v6:
- Rebased on Linux-6.5-rc4
- Updated PATCH2 to use IS_ENABLED(CONFIG_SPARC) instead of
!IS_ENABLED(CONFIG_OF_IRQ)
- Added new PATCH4 to fix syscore registration in PLIC driver
- Update PATCH5 to convert PLIC driver into full-blown platform driver
with a re-written probe function.
Changes since v5:
- Rebased on Linux-6.5-rc2
- Updated the overall series to ensure that only IPI, timer, and
INTC drivers are probed very early whereas rest of the interrupt
controllers (such as PLIC, APLIC, and IMISC) are probed as
regular platform drivers.
- Renamed riscv_fw_parent_hartid() to riscv_get_intc_hartid()
- New PATCH1 to add fw_devlink support for msi-parent DT property
- New PATCH2 to ensure all INTC suppliers are initialized which in-turn
fixes the probing issue for PLIC, APLIC and IMSIC as platform driver
- New PATCH3 to use platform driver probing for PLIC
- Re-structured the IMSIC driver into two separate drivers: early and
platform. The IMSIC early driver (PATCH7) only initialized IMSIC state
and provides IPIs whereas the IMSIC platform driver (PATCH8) is probed
provides MSI domain for platform devices.
- Re-structure the APLIC platform driver into three separe sources: main,
direct mode, and MSI mode.
Changes since v4:
- Rebased on Linux-6.5-rc1
- Added "Dependencies" in the APLIC bindings (PATCH6 in v4)
- Dropped the PATCH6 which was changing the IOMMU DMA domain APIs
- Dropped use of IOMMU DMA APIs in the IMSIC driver (PATCH4)
Changes since v3:
- Rebased on Linux-6.4-rc6
- Droped PATCH2 of v3 series instead we now set FWNODE_FLAG_BEST_EFFORT via
IRQCHIP_DECLARE()
- Extend riscv_fw_parent_hartid() to support both DT and ACPI in PATCH1
- Extend iommu_dma_compose_msi_msg() instead of adding iommu_dma_select_msi()
in PATCH6
- Addressed Conor's comments in PATCH3
- Addressed Conor's and Rob's comments in PATCH7
Changes since v2:
- Rebased on Linux-6.4-rc1
- Addressed Rob's comments on DT bindings patches 4 and 8.
- Addessed Marc's comments on IMSIC driver PATCH5
- Replaced use of OF apis in APLIC and IMSIC drivers with FWNODE apis
this makes both drivers easily portable for ACPI support. This also
removes unnecessary indirection from the APLIC and IMSIC drivers.
- PATCH1 is a new patch for portability with ACPI support
- PATCH2 is a new patch to fix probing in APLIC drivers for APLIC-only systems.
- PATCH7 is a new patch which addresses the IOMMU DMA domain issues pointed
out by SiFive
Changes since v1:
- Rebased on Linux-6.2-rc2
- Addressed comments on IMSIC DT bindings for PATCH4
- Use raw_spin_lock_irqsave() on ids_lock for PATCH5
- Improved MMIO alignment checks in PATCH5 to allow MMIO regions
with holes.
- Addressed comments on APLIC DT bindings for PATCH6
- Fixed warning splat in aplic_msi_write_msg() caused by
zeroed MSI message in PATCH7
- Dropped DT property riscv,slow-ipi instead will have module
parameter in future.
Anup Patel (11):
irqchip/sifive-plic: Convert PLIC driver into a platform driver
irqchip/riscv-intc: Add support for RISC-V AIA
dt-bindings: interrupt-controller: Add RISC-V incoming MSI controller
irqchip: Add RISC-V incoming MSI controller early driver
irqchip/riscv-imsic: Add device MSI domain support for platform
devices
irqchip/riscv-imsic: Add device MSI domain support for PCI devices
dt-bindings: interrupt-controller: Add RISC-V advanced PLIC
irqchip: Add RISC-V advanced PLIC driver for direct-mode
irqchip/riscv-aplic: Add support for MSI-mode
RISC-V: Select APLIC and IMSIC drivers
MAINTAINERS: Add entry for RISC-V AIA drivers
Björn Töpel (1):
genirq/matrix: Dynamic bitmap allocation
Thomas Gleixner (13):
irqchip/gic-v3: Make gic_irq_domain_select() robust for zero parameter
count
genirq/irqdomain: Remove the param count restriction from select()
genirq/msi: Extend msi_parent_ops
genirq/irqdomain: Add DOMAIN_BUS_DEVICE_IMS
platform-msi: Prepare for real per device domains
irqchip: Convert all platform MSI users to the new API
genirq/msi: Provide optional translation op
genirq/msi: Split msi_domain_alloc_irq_at()
genirq/msi: Provide DOMAIN_BUS_WIRED_TO_MSI
genirq/msi: Optionally use dev->fwnode for device domain
genirq/msi: Provide allocation/free functions for "wired" MSI
interrupts
genirq/irqdomain: Reroute device MSI create_mapping
genirq/msi: Provide MSI_FLAG_PARENT_PM_DEV
.../interrupt-controller/riscv,aplic.yaml | 172 ++++
.../interrupt-controller/riscv,imsics.yaml | 172 ++++
MAINTAINERS | 14 +
arch/riscv/Kconfig | 2 +
arch/x86/include/asm/hw_irq.h | 2 -
drivers/base/platform-msi.c | 97 ++
drivers/dma/mv_xor_v2.c | 8 +-
drivers/dma/qcom/hidma.c | 6 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 5 +-
drivers/irqchip/Kconfig | 25 +
drivers/irqchip/Makefile | 3 +
drivers/irqchip/irq-gic-v3.c | 6 +-
drivers/irqchip/irq-riscv-aplic-direct.c | 343 +++++++
drivers/irqchip/irq-riscv-aplic-main.c | 232 +++++
drivers/irqchip/irq-riscv-aplic-main.h | 53 ++
drivers/irqchip/irq-riscv-aplic-msi.c | 256 +++++
drivers/irqchip/irq-riscv-imsic-early.c | 241 +++++
drivers/irqchip/irq-riscv-imsic-platform.c | 403 ++++++++
drivers/irqchip/irq-riscv-imsic-state.c | 887 ++++++++++++++++++
drivers/irqchip/irq-riscv-imsic-state.h | 105 +++
drivers/irqchip/irq-riscv-intc.c | 34 +-
drivers/irqchip/irq-sifive-plic.c | 239 +++--
drivers/mailbox/bcm-flexrm-mailbox.c | 8 +-
drivers/perf/arm_smmuv3_pmu.c | 4 +-
drivers/ufs/host/ufs-qcom.c | 8 +-
include/linux/irqchip/riscv-aplic.h | 119 +++
include/linux/irqchip/riscv-imsic.h | 87 ++
include/linux/irqdomain.h | 17 +
include/linux/irqdomain_defs.h | 2 +
include/linux/msi.h | 21 +
kernel/irq/irqdomain.c | 28 +-
kernel/irq/matrix.c | 28 +-
kernel/irq/msi.c | 184 +++-
33 files changed, 3636 insertions(+), 175 deletions(-)
create mode 100644 Documentation/devicetree/bindings/interrupt-controller/riscv,aplic.yaml
create mode 100644 Documentation/devicetree/bindings/interrupt-controller/riscv,imsics.yaml
create mode 100644 drivers/irqchip/irq-riscv-aplic-direct.c
create mode 100644 drivers/irqchip/irq-riscv-aplic-main.c
create mode 100644 drivers/irqchip/irq-riscv-aplic-main.h
create mode 100644 drivers/irqchip/irq-riscv-aplic-msi.c
create mode 100644 drivers/irqchip/irq-riscv-imsic-early.c
create mode 100644 drivers/irqchip/irq-riscv-imsic-platform.c
create mode 100644 drivers/irqchip/irq-riscv-imsic-state.c
create mode 100644 drivers/irqchip/irq-riscv-imsic-state.h
create mode 100644 include/linux/irqchip/riscv-aplic.h
create mode 100644 include/linux/irqchip/riscv-imsic.h
--
2.34.1
From: Thomas Gleixner <[email protected]>
Currently the irqdomain select callback is only invoked when the parameter
count of the fwspec arguments is not zero. That makes sense because then
the match is on the firmware node and eventually on the bus_token, which is
already handled in the core code.
The upcoming support for per device MSI domains requires to do real bus
token specific checks in the MSI parent domains with a zero parameter
count.
Make the gic-v3 select() callback handle that case.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Marc Zyngier <[email protected]>
---
drivers/irqchip/irq-gic-v3.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 98b0329b7154..35b9362d178f 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1702,9 +1702,13 @@ static int gic_irq_domain_select(struct irq_domain *d,
irq_hw_number_t hwirq;
/* Not for us */
- if (fwspec->fwnode != d->fwnode)
+ if (fwspec->fwnode != d->fwnode)
return 0;
+ /* Handle pure domain searches */
+ if (!fwspec->param_count)
+ return d->bus_token == bus_token;
+
/* If this is not DT, then we have a single domain */
if (!is_of_node(fwspec->fwnode))
return 1;
--
2.34.1
From: Thomas Gleixner <[email protected]>
Now that the GIC-v3 callback can handle invocation with a fwspec parameter
count of 0 lift the restriction in the core code and invoke select()
unconditionally when the domain provides it.
Preparatory change for per device MSI domains.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
---
kernel/irq/irqdomain.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 0bdef4fe925b..8fee37918195 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -448,7 +448,7 @@ struct irq_domain *irq_find_matching_fwspec(struct irq_fwspec *fwspec,
*/
mutex_lock(&irq_domain_mutex);
list_for_each_entry(h, &irq_domain_list, link) {
- if (h->ops->select && fwspec->param_count)
+ if (h->ops->select)
rc = h->ops->select(h, fwspec, bus_token);
else if (h->ops->match)
rc = h->ops->match(h, to_of_node(fwnode), bus_token);
--
2.34.1
From: Thomas Gleixner <[email protected]>
Supporting per device MSI domains on ARM64, RISC-V and the zoo of
interrupt mechanisms needs a bit more information than what the
initial x86 implementation provides.
Add the following fields:
- required_flags: The flags which a parent domain requires to be set
- bus_select_token: The bus token of the parent domain for select()
- bus_select_mask: A bitmask of supported child domain bus types
This allows to provide library functions which can be shared between
various interrupt chip implementations and avoids replicating mostly
similar code all over the place.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
---
include/linux/msi.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/include/linux/msi.h b/include/linux/msi.h
index ddace8c34dcf..d5d1513ef4d6 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -572,6 +572,11 @@ enum {
* struct msi_parent_ops - MSI parent domain callbacks and configuration info
*
* @supported_flags: Required: The supported MSI flags of the parent domain
+ * @required_flags: Optional: The required MSI flags of the parent MSI domain
+ * @bus_select_token: Optional: The bus token of the real parent domain for
+ * irq_domain::select()
+ * @bus_select_mask: Optional: A mask of supported BUS_DOMAINs for
+ * irq_domain::select()
* @prefix: Optional: Prefix for the domain and chip name
* @init_dev_msi_info: Required: Callback for MSI parent domains to setup parent
* domain specific domain flags, domain ops and interrupt chip
@@ -579,6 +584,9 @@ enum {
*/
struct msi_parent_ops {
u32 supported_flags;
+ u32 required_flags;
+ u32 bus_select_token;
+ u32 bus_select_mask;
const char *prefix;
bool (*init_dev_msi_info)(struct device *dev, struct irq_domain *domain,
struct irq_domain *msi_parent_domain,
--
2.34.1
From: Thomas Gleixner <[email protected]>
Add a new domain bus token to prepare for device MSI which aims to replace
the existing platform MSI maze.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
---
include/linux/irqdomain_defs.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/irqdomain_defs.h b/include/linux/irqdomain_defs.h
index c29921fd8cd1..4c69151cb9d2 100644
--- a/include/linux/irqdomain_defs.h
+++ b/include/linux/irqdomain_defs.h
@@ -26,6 +26,7 @@ enum irq_domain_bus_token {
DOMAIN_BUS_DMAR,
DOMAIN_BUS_AMDVI,
DOMAIN_BUS_PCI_DEVICE_IMS,
+ DOMAIN_BUS_DEVICE_IMS,
};
#endif /* _LINUX_IRQDOMAIN_DEFS_H */
--
2.34.1
From: Thomas Gleixner <[email protected]>
Provide functions to create and remove per device MSI domains which replace
the platform-MSI domains. The new model is that each of the devices which
utilize platform-MSI gets now its private MSI domain which is "customized"
in size and with a device specific function to write the MSI message into
the device.
This is the same functionality as platform-MSI but it avoids all the down
sides of platform MSI, i.e. the extra ID book keeping, the special data
structure in the msi descriptor. Further the domains are only created when
the devices are really in use, so the burden is on the usage and not on the
infrastructure.
Fill in the domain template and provide two functions to init/allocate and
remove a per device MSI domain.
Until all users and parent domain providers are converted, the init/alloc
function invokes the original platform-MSI code when the irqdomain which is
associated to the device does not provide MSI parent functionality yet.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
---
drivers/base/platform-msi.c | 97 +++++++++++++++++++++++++++++++++++++
include/linux/msi.h | 4 ++
2 files changed, 101 insertions(+)
diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
index f37ad34c80ec..dbd19f329354 100644
--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -13,6 +13,8 @@
#include <linux/msi.h>
#include <linux/slab.h>
+/* Begin of removal area. Once everything is converted over. Cleanup the includes too! */
+
#define DEV_ID_SHIFT 21
#define MAX_DEV_MSIS (1 << (32 - DEV_ID_SHIFT))
@@ -350,3 +352,98 @@ int platform_msi_device_domain_alloc(struct irq_domain *domain, unsigned int vir
return msi_domain_populate_irqs(domain->parent, dev, virq, nr_irqs, &data->arg);
}
+
+/* End of removal area */
+
+/* Real per device domain interfaces */
+
+/*
+ * This indirection can go when platform_device_ims_init_and_alloc_irqs()
+ * is switched to a proper irq_chip::irq_write_msi_msg() callback. Keep it
+ * simple for now.
+ */
+static void platform_msi_write_msi_msg(struct irq_data *d, struct msi_msg *msg)
+{
+ irq_write_msi_msg_t cb = d->chip_data;
+
+ cb(irq_data_get_msi_desc(d), msg);
+}
+
+static void platform_msi_set_desc_byindex(msi_alloc_info_t *arg, struct msi_desc *desc)
+{
+ arg->desc = desc;
+ arg->hwirq = desc->msi_index;
+}
+
+static const struct msi_domain_template platform_msi_template = {
+ .chip = {
+ .name = "pMSI",
+ .irq_mask = irq_chip_mask_parent,
+ .irq_unmask = irq_chip_unmask_parent,
+ .irq_write_msi_msg = platform_msi_write_msi_msg,
+ /* The rest is filled in by the platform MSI parent */
+ },
+
+ .ops = {
+ .set_desc = platform_msi_set_desc_byindex,
+ },
+
+ .info = {
+ .bus_token = DOMAIN_BUS_DEVICE_IMS,
+ },
+};
+
+/**
+ * platform_device_ims_init_and_alloc_irqs - Initialize platform device IMS
+ * and allocate interrupts for @dev
+ * @dev: The device for which to allocate interrupts
+ * @nvec: The number of interrupts to allocate
+ * @write_msi_msg: Callback to write an interrupt message for @dev
+ *
+ * Returns:
+ * Zero for success, or an error code in case of failure
+ *
+ * This creates a MSI domain on @dev which has @dev->msi.domain as
+ * parent. The parent domain sets up the new domain. The domain has
+ * a fixed size of @nvec. The domain is managed by devres and will
+ * be removed when the device is removed.
+ *
+ * Note: For migration purposes this falls back to the original platform_msi code
+ * up to the point where all platforms have been converted to the MSI
+ * parent model.
+ */
+int platform_device_ims_init_and_alloc_irqs(struct device *dev, unsigned int nvec,
+ irq_write_msi_msg_t write_msi_msg)
+{
+ struct irq_domain *domain = dev->msi.domain;
+
+ if (!domain || !write_msi_msg)
+ return -EINVAL;
+
+ /* Migration support. Will go away once everything is converted */
+ if (!irq_domain_is_msi_parent(domain))
+ return platform_msi_domain_alloc_irqs(dev, nvec, write_msi_msg);
+
+ /*
+ * @write_msi_msg is stored in the resulting msi_domain_info::data.
+ * The underlying domain creation mechanism will assign that
+ * callback to the resulting irq chip.
+ */
+ if (!msi_create_device_irq_domain(dev, MSI_DEFAULT_DOMAIN,
+ &platform_msi_template,
+ nvec, NULL, write_msi_msg))
+ return -ENODEV;
+
+ return msi_domain_alloc_irqs_range(dev, MSI_DEFAULT_DOMAIN, 0, nvec - 1);
+}
+EXPORT_SYMBOL_GPL(platform_device_ims_init_and_alloc_irqs);
+
+/**
+ * platform_device_ims_free_irqs_all - Free all interrupts for @dev
+ * @dev: The device for which to free interrupts
+ */
+void platform_device_ims_free_irqs_all(struct device *dev)
+{
+ msi_domain_free_irqs_all(dev, MSI_DEFAULT_DOMAIN);
+}
+EXPORT_SYMBOL_GPL(platform_device_ims_free_irqs_all);
diff --git a/include/linux/msi.h b/include/linux/msi.h
index d5d1513ef4d6..9bec9ca19800 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -664,6 +664,10 @@ int platform_msi_device_domain_alloc(struct irq_domain *domain, unsigned int vir
void platform_msi_device_domain_free(struct irq_domain *domain, unsigned int virq,
unsigned int nvec);
void *platform_msi_get_host_data(struct irq_domain *domain);
+/* Per device platform MSI */
+int platform_device_ims_init_and_alloc_irqs(struct device *dev, unsigned int nvec,
+ irq_write_msi_msg_t write_msi_msg);
+void platform_device_ims_free_irqs_all(struct device *dev);
bool msi_device_has_isolated_msi(struct device *dev);
#else /* CONFIG_GENERIC_MSI_IRQ */
--
2.34.1
From: Thomas Gleixner <[email protected]>
Switch all the users of the platform MSI domain over to invoke the new
interfaces which branch to the original platform MSI functions when the
irqdomain associated to the caller device does not yet provide MSI parent
functionality.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Cc: Vinod Koul <[email protected]>
Cc: Sinan Kaya <[email protected]>
Cc: Andy Gross <[email protected]>
Cc: Bjorn Andersson <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Manivannan Sadhasivam <[email protected]>
---
drivers/dma/mv_xor_v2.c | 8 ++++----
drivers/dma/qcom/hidma.c | 6 +++---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 5 +++--
drivers/mailbox/bcm-flexrm-mailbox.c | 8 ++++----
drivers/perf/arm_smmuv3_pmu.c | 4 ++--
drivers/ufs/host/ufs-qcom.c | 8 ++++----
6 files changed, 20 insertions(+), 19 deletions(-)
diff --git a/drivers/dma/mv_xor_v2.c b/drivers/dma/mv_xor_v2.c
index 1ebfbe88e733..732663ad6d71 100644
--- a/drivers/dma/mv_xor_v2.c
+++ b/drivers/dma/mv_xor_v2.c
@@ -747,8 +747,8 @@ static int mv_xor_v2_probe(struct platform_device *pdev)
if (IS_ERR(xor_dev->clk))
return PTR_ERR(xor_dev->clk);
- ret = platform_msi_domain_alloc_irqs(&pdev->dev, 1,
- mv_xor_v2_set_msi_msg);
+ ret = platform_device_ims_init_and_alloc_irqs(&pdev->dev, 1,
+ mv_xor_v2_set_msi_msg);
if (ret)
return ret;
@@ -851,7 +851,7 @@ static int mv_xor_v2_probe(struct platform_device *pdev)
xor_dev->desc_size * MV_XOR_V2_DESC_NUM,
xor_dev->hw_desq_virt, xor_dev->hw_desq);
free_msi_irqs:
- platform_msi_domain_free_irqs(&pdev->dev);
+ platform_device_ims_free_irqs_all(&pdev->dev);
return ret;
}
@@ -867,7 +867,7 @@ static void mv_xor_v2_remove(struct platform_device *pdev)
devm_free_irq(&pdev->dev, xor_dev->irq, xor_dev);
- platform_msi_domain_free_irqs(&pdev->dev);
+ platform_device_ims_free_irqs_all(&pdev->dev);
tasklet_kill(&xor_dev->irq_tasklet);
}
diff --git a/drivers/dma/qcom/hidma.c b/drivers/dma/qcom/hidma.c
index d63b93dc7047..4065d6eab49e 100644
--- a/drivers/dma/qcom/hidma.c
+++ b/drivers/dma/qcom/hidma.c
@@ -696,7 +696,7 @@ static void hidma_free_msis(struct hidma_dev *dmadev)
devm_free_irq(dev, virq, &dmadev->lldev);
}
- platform_msi_domain_free_irqs(dev);
+ platform_device_ims_free_irqs_all(dev);
#endif
}
@@ -706,8 +706,8 @@ static int hidma_request_msi(struct hidma_dev *dmadev,
#ifdef CONFIG_GENERIC_MSI_IRQ
int rc, i, virq;
- rc = platform_msi_domain_alloc_irqs(&pdev->dev, HIDMA_MSI_INTS,
- hidma_write_msi_msg);
+ rc = platform_device_ims_init_and_alloc_irqs(&pdev->dev, HIDMA_MSI_INTS,
+ hidma_write_msi_msg);
if (rc)
return rc;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 0ffb1cf17e0b..84a765b1f64e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3125,7 +3125,8 @@ static int arm_smmu_update_gbpa(struct arm_smmu_device *smmu, u32 set, u32 clr)
static void arm_smmu_free_msis(void *data)
{
struct device *dev = data;
- platform_msi_domain_free_irqs(dev);
+
+ platform_device_ims_free_irqs_all(dev);
}
static void arm_smmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
@@ -3166,7 +3167,7 @@ static void arm_smmu_setup_msis(struct arm_smmu_device *smmu)
}
/* Allocate MSIs for evtq, gerror and priq. Ignore cmdq */
- ret = platform_msi_domain_alloc_irqs(dev, nvec, arm_smmu_write_msi_msg);
+ ret = platform_device_ims_init_and_alloc_irqs(dev, nvec, arm_smmu_write_msi_msg);
if (ret) {
dev_warn(dev, "failed to allocate MSIs - falling back to wired irqs\n");
return;
diff --git a/drivers/mailbox/bcm-flexrm-mailbox.c b/drivers/mailbox/bcm-flexrm-mailbox.c
index e3e28a4f7d01..333ca6c519cb 100644
--- a/drivers/mailbox/bcm-flexrm-mailbox.c
+++ b/drivers/mailbox/bcm-flexrm-mailbox.c
@@ -1587,8 +1587,8 @@ static int flexrm_mbox_probe(struct platform_device *pdev)
}
/* Allocate platform MSIs for each ring */
- ret = platform_msi_domain_alloc_irqs(dev, mbox->num_rings,
- flexrm_mbox_msi_write);
+ ret = platform_device_ims_init_and_alloc_irqs(dev, mbox->num_rings,
+ flexrm_mbox_msi_write);
if (ret)
goto fail_destroy_cmpl_pool;
@@ -1641,7 +1641,7 @@ static int flexrm_mbox_probe(struct platform_device *pdev)
fail_free_debugfs_root:
debugfs_remove_recursive(mbox->root);
- platform_msi_domain_free_irqs(dev);
+ platform_device_ims_free_irqs_all(dev);
fail_destroy_cmpl_pool:
dma_pool_destroy(mbox->cmpl_pool);
fail_destroy_bd_pool:
@@ -1657,7 +1657,7 @@ static void flexrm_mbox_remove(struct platform_device *pdev)
debugfs_remove_recursive(mbox->root);
- platform_msi_domain_free_irqs(dev);
+ platform_device_ims_free_irqs_all(dev);
dma_pool_destroy(mbox->cmpl_pool);
dma_pool_destroy(mbox->bd_pool);
diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 6303b82566f9..32b604e8bdf3 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -716,7 +716,7 @@ static void smmu_pmu_free_msis(void *data)
{
struct device *dev = data;
- platform_msi_domain_free_irqs(dev);
+ platform_device_ims_free_irqs_all(dev);
}
static void smmu_pmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
@@ -746,7 +746,7 @@ static void smmu_pmu_setup_msi(struct smmu_pmu *pmu)
if (!(readl(pmu->reg_base + SMMU_PMCG_CFGR) & SMMU_PMCG_CFGR_MSI))
return;
- ret = platform_msi_domain_alloc_irqs(dev, 1, smmu_pmu_write_msi_msg);
+ ret = platform_device_ims_init_and_alloc_irqs(dev, 1, smmu_pmu_write_msi_msg);
if (ret) {
dev_warn(dev, "failed to allocate MSIs\n");
return;
diff --git a/drivers/ufs/host/ufs-qcom.c b/drivers/ufs/host/ufs-qcom.c
index 39eef470f8fa..f4c4becdef0b 100644
--- a/drivers/ufs/host/ufs-qcom.c
+++ b/drivers/ufs/host/ufs-qcom.c
@@ -1712,8 +1712,8 @@ static int ufs_qcom_config_esi(struct ufs_hba *hba)
* 2. Poll queues do not need ESI.
*/
nr_irqs = hba->nr_hw_queues - hba->nr_queues[HCTX_TYPE_POLL];
- ret = platform_msi_domain_alloc_irqs(hba->dev, nr_irqs,
- ufs_qcom_write_msi_msg);
+ ret = platform_device_ims_init_and_alloc_irqs(hba->dev, nr_irqs,
+ ufs_qcom_write_msi_msg);
if (ret) {
dev_err(hba->dev, "Failed to request Platform MSI %d\n", ret);
return ret;
@@ -1742,7 +1742,7 @@ static int ufs_qcom_config_esi(struct ufs_hba *hba)
devm_free_irq(hba->dev, desc->irq, hba);
}
msi_unlock_descs(hba->dev);
- platform_msi_domain_free_irqs(hba->dev);
+ platform_device_ims_free_irqs_all(hba->dev);
} else {
if (host->hw_ver.major == 6 && host->hw_ver.minor == 0 &&
host->hw_ver.step == 0)
@@ -1818,7 +1818,7 @@ static void ufs_qcom_remove(struct platform_device *pdev)
pm_runtime_get_sync(&(pdev)->dev);
ufshcd_remove(hba);
- platform_msi_domain_free_irqs(hba->dev);
+ platform_device_ims_free_irqs_all(hba->dev);
}
static const struct of_device_id ufs_qcom_of_match[] __maybe_unused = {
--
2.34.1
From: Thomas Gleixner <[email protected]>
irq_create_fwspec_mapping() requires translation of the firmware spec to a
hardware interrupt number and the trigger type information.
Wired interrupts which are connected to a wire to MSI bridge, like MBIGEN
are allocated that way. So far MBIGEN provides a regular irqdomain which
then hooks backwards into the MSI infrastructure. That's an unholy mess and
will be replaced with per device MSI domains which are regular MSI domains.
Interrupts on MSI domains are not supported by irq_create_fwspec_mapping(),
but for making the wire to MSI bridges sane it makes sense to provide a
special allocation/free interface in the MSI infrastructure. That avoids
the backdoors into the core MSI allocation code and just shares all the
regular MSI infrastructure.
Provide an optional translation callback in msi_domain_ops which can be
utilized by these wire to MSI bridges. No other MSI domain should provide a
translation callback. The default translation callback of the MSI
irqdomains will warn when it is invoked on a non-prepared MSI domain.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
---
include/linux/msi.h | 5 +++++
kernel/irq/msi.c | 15 +++++++++++++++
2 files changed, 20 insertions(+)
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 9bec9ca19800..fd184309a429 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -412,6 +412,7 @@ bool arch_restore_msi_irqs(struct pci_dev *dev);
struct irq_domain;
struct irq_domain_ops;
struct irq_chip;
+struct irq_fwspec;
struct device_node;
struct fwnode_handle;
struct msi_domain_info;
@@ -431,6 +432,8 @@ struct msi_domain_info;
* function.
* @msi_post_free: Optional function which is invoked after freeing
* all interrupts.
+ * @msi_translate: Optional translate callback to support the odd wire to
+ * MSI bridges, e.g. MBIGEN
*
* @get_hwirq, @msi_init and @msi_free are callbacks used by the underlying
* irqdomain.
@@ -468,6 +471,8 @@ struct msi_domain_ops {
struct device *dev);
void (*msi_post_free)(struct irq_domain *domain,
struct device *dev);
+ int (*msi_translate)(struct irq_domain *domain, struct irq_fwspec *fwspec,
+ irq_hw_number_t *hwirq, unsigned int *type);
};
/**
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 79b4a58ba9c3..c0e73788e878 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -726,11 +726,26 @@ static void msi_domain_free(struct irq_domain *domain, unsigned int virq,
irq_domain_free_irqs_top(domain, virq, nr_irqs);
}
+static int msi_domain_translate(struct irq_domain *domain, struct irq_fwspec *fwspec,
+ irq_hw_number_t *hwirq, unsigned int *type)
+{
+ struct msi_domain_info *info = domain->host_data;
+
+ /*
+ * This will catch allocations through the regular irqdomain path except
+ * for MSI domains which really support this, e.g. MBIGEN.
+ */
+ if (!info->ops->msi_translate)
+ return -ENOTSUPP;
+ return info->ops->msi_translate(domain, fwspec, hwirq, type);
+}
+
static const struct irq_domain_ops msi_domain_ops = {
.alloc = msi_domain_alloc,
.free = msi_domain_free,
.activate = msi_domain_activate,
.deactivate = msi_domain_deactivate,
+ .translate = msi_domain_translate,
};
static irq_hw_number_t msi_domain_ops_get_hwirq(struct msi_domain_info *info,
--
2.34.1
From: Thomas Gleixner <[email protected]>
In preparation for providing a special allocation function for wired
interrupt which are connected to a wire to MSI bridge, split the inner
workings of msi_domain_alloc_irq_at() out into a helper function so the
code can be shared.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
---
kernel/irq/msi.c | 76 +++++++++++++++++++++++++++---------------------
1 file changed, 43 insertions(+), 33 deletions(-)
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index c0e73788e878..8d463901c864 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -1446,34 +1446,10 @@ int msi_domain_alloc_irqs_all_locked(struct device *dev, unsigned int domid, int
return msi_domain_alloc_locked(dev, &ctrl);
}
-/**
- * msi_domain_alloc_irq_at - Allocate an interrupt from a MSI interrupt domain at
- * a given index - or at the next free index
- *
- * @dev: Pointer to device struct of the device for which the interrupts
- * are allocated
- * @domid: Id of the interrupt domain to operate on
- * @index: Index for allocation. If @index == %MSI_ANY_INDEX the allocation
- * uses the next free index.
- * @affdesc: Optional pointer to an interrupt affinity descriptor structure
- * @icookie: Optional pointer to a domain specific per instance cookie. If
- * non-NULL the content of the cookie is stored in msi_desc::data.
- * Must be NULL for MSI-X allocations
- *
- * This requires a MSI interrupt domain which lets the core code manage the
- * MSI descriptors.
- *
- * Return: struct msi_map
- *
- * On success msi_map::index contains the allocated index number and
- * msi_map::virq the corresponding Linux interrupt number
- *
- * On failure msi_map::index contains the error code and msi_map::virq
- * is %0.
- */
-struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, unsigned int index,
- const struct irq_affinity_desc *affdesc,
- union msi_instance_cookie *icookie)
+static struct msi_map __msi_domain_alloc_irq_at(struct device *dev, unsigned int domid,
+ unsigned int index,
+ const struct irq_affinity_desc *affdesc,
+ union msi_instance_cookie *icookie)
{
struct msi_ctrl ctrl = { .domid = domid, .nirqs = 1, };
struct irq_domain *domain;
@@ -1481,17 +1457,16 @@ struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, u
struct msi_desc *desc;
int ret;
- msi_lock_descs(dev);
domain = msi_get_device_domain(dev, domid);
if (!domain) {
map.index = -ENODEV;
- goto unlock;
+ return map;
}
desc = msi_alloc_desc(dev, 1, affdesc);
if (!desc) {
map.index = -ENOMEM;
- goto unlock;
+ return map;
}
if (icookie)
@@ -1500,7 +1475,7 @@ struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, u
ret = msi_insert_desc(dev, desc, domid, index);
if (ret) {
map.index = ret;
- goto unlock;
+ return map;
}
ctrl.first = ctrl.last = desc->msi_index;
@@ -1513,7 +1488,42 @@ struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, u
map.index = desc->msi_index;
map.virq = desc->irq;
}
-unlock:
+ return map;
+}
+
+/**
+ * msi_domain_alloc_irq_at - Allocate an interrupt from a MSI interrupt domain at
+ * a given index - or at the next free index
+ *
+ * @dev: Pointer to device struct of the device for which the interrupts
+ * are allocated
+ * @domid: Id of the interrupt domain to operate on
+ * @index: Index for allocation. If @index == %MSI_ANY_INDEX the allocation
+ * uses the next free index.
+ * @affdesc: Optional pointer to an interrupt affinity descriptor structure
+ * @icookie: Optional pointer to a domain specific per instance cookie. If
+ * non-NULL the content of the cookie is stored in msi_desc::data.
+ * Must be NULL for MSI-X allocations
+ *
+ * This requires a MSI interrupt domain which lets the core code manage the
+ * MSI descriptors.
+ *
+ * Return: struct msi_map
+ *
+ * On success msi_map::index contains the allocated index number and
+ * msi_map::virq the corresponding Linux interrupt number
+ *
+ * On failure msi_map::index contains the error code and msi_map::virq
+ * is %0.
+ */
+struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, unsigned int index,
+ const struct irq_affinity_desc *affdesc,
+ union msi_instance_cookie *icookie)
+{
+ struct msi_map map;
+
+ msi_lock_descs(dev);
+ map = __msi_domain_alloc_irq_at(dev, domid, index, affdesc, icookie);
msi_unlock_descs(dev);
return map;
}
--
2.34.1
From: Thomas Gleixner <[email protected]>
Provide a domain bus token for the upcoming support for wire to MSI device
domains so the domain can be distinguished from regular device MSI domains.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
---
include/linux/irqdomain_defs.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/irqdomain_defs.h b/include/linux/irqdomain_defs.h
index 4c69151cb9d2..f59d2e9941a2 100644
--- a/include/linux/irqdomain_defs.h
+++ b/include/linux/irqdomain_defs.h
@@ -27,6 +27,7 @@ enum irq_domain_bus_token {
DOMAIN_BUS_AMDVI,
DOMAIN_BUS_PCI_DEVICE_IMS,
DOMAIN_BUS_DEVICE_IMS,
+ DOMAIN_BUS_WIRED_TO_MSI,
};
#endif /* _LINUX_IRQDOMAIN_DEFS_H */
--
2.34.1
From: Thomas Gleixner <[email protected]>
To support wire to MSI domains via the MSI infrastructure it is required to
use the firmware node of the device which implements this for creating the
MSI domain. Otherwise the existing firmware match mechanisms to find the
correct irqdomain for a wired interrupt which is connected to a wire to MSI
bridge would fail.
This cannot be used for the general case because not all devices provide
firmware nodes and all regular per device MSI domains are directly
associated to the device and have not be searched for.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
---
include/linux/msi.h | 2 ++
kernel/irq/msi.c | 20 ++++++++++++++++----
2 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/include/linux/msi.h b/include/linux/msi.h
index fd184309a429..ac73f678da7d 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -552,6 +552,8 @@ enum {
MSI_FLAG_ALLOC_SIMPLE_MSI_DESCS = (1 << 5),
/* Free MSI descriptors */
MSI_FLAG_FREE_MSI_DESCS = (1 << 6),
+ /* Use dev->fwnode for MSI device domain creation */
+ MSI_FLAG_USE_DEV_FWNODE = (1 << 7),
/* Mask for the generic functionality */
MSI_GENERIC_FLAGS_MASK = GENMASK(15, 0),
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 8d463901c864..5289fc2c7630 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -960,9 +960,9 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
void *chip_data)
{
struct irq_domain *domain, *parent = dev->msi.domain;
- const struct msi_parent_ops *pops;
+ struct fwnode_handle *fwnode, *fwnalloced = NULL;
struct msi_domain_template *bundle;
- struct fwnode_handle *fwnode;
+ const struct msi_parent_ops *pops;
if (!irq_domain_is_msi_parent(parent))
return false;
@@ -985,7 +985,19 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
pops->prefix ? : "", bundle->chip.name, dev_name(dev));
bundle->chip.name = bundle->name;
- fwnode = irq_domain_alloc_named_fwnode(bundle->name);
+ /*
+ * Using the device firmware node is required for wire to MSI
+ * device domains so that the existing firmware results in a domain
+ * match.
+ * All other device domains like PCI/MSI use the named firmware
+ * node as they are not guaranteed to have a fwnode. They are never
+ * looked up and always handled in the context of the device.
+ */
+ if (bundle->info.flags & MSI_FLAG_USE_DEV_FWNODE)
+ fwnode = dev->fwnode;
+ else
+ fwnode = fwnalloced = irq_domain_alloc_named_fwnode(bundle->name);
+
if (!fwnode)
goto free_bundle;
@@ -1012,7 +1024,7 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
fail:
msi_unlock_descs(dev);
free_fwnode:
- irq_domain_free_fwnode(fwnode);
+ irq_domain_free_fwnode(fwnalloced);
free_bundle:
kfree(bundle);
return false;
--
2.34.1
From: Thomas Gleixner <[email protected]>
To support wire to MSI bridges proper in the MSI core infrastructure it is
required to have separate allocation/free interfaces which can be invoked
from the regular irqdomain allocaton/free functions.
The mechanism for allocation is:
- Allocate the next free MSI descriptor index in the domain
- Store the hardware interrupt number and the trigger type
which was extracted by the irqdomain core from the firmware spec
in the MSI descriptor device cookie so it can be retrieved by
the underlying interrupt domain and interrupt chip
- Use the regular MSI allocation mechanism for the newly allocated
index which returns a fully initialized Linux interrupt on succes
This works because:
- the domains have a fixed size
- each hardware interrupt is only allocated once
- the underlying domain does not care about the MSI index it only cares
about the hardware interrupt number and the trigger type
The free function looks up the MSI index in the MSI descriptor of the
provided Linux interrupt number and uses the regular index based free
functions of the MSI core.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
---
include/linux/irqdomain.h | 17 ++++++++++
kernel/irq/msi.c | 68 +++++++++++++++++++++++++++++++++++++++
2 files changed, 85 insertions(+)
diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index ee0a82c60508..21ecf582a0fe 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -619,6 +619,23 @@ static inline bool irq_domain_is_msi_device(struct irq_domain *domain)
#endif /* CONFIG_IRQ_DOMAIN_HIERARCHY */
+#ifdef CONFIG_GENERIC_MSI_IRQ
+int msi_device_domain_alloc_wired(struct irq_domain *domain, unsigned int hwirq,
+ unsigned int type);
+void msi_device_domain_free_wired(struct irq_domain *domain, unsigned int virq);
+#else
+static inline int msi_device_domain_alloc_wired(struct irq_domain *domain, unsigned int hwirq,
+ unsigned int type)
+{
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+}
+static inline void msi_device_domain_free_wired(struct irq_domain *domain, unsigned int virq)
+{
+ WARN_ON_ONCE(1);
+}
+#endif
+
#else /* CONFIG_IRQ_DOMAIN */
static inline void irq_dispose_mapping(unsigned int virq) { }
static inline struct irq_domain *irq_find_matching_fwnode(
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 5289fc2c7630..07e9daaf0657 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -1540,6 +1540,50 @@ struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, u
return map;
}
+/**
+ * msi_device_domain_alloc_wired - Allocate a "wired" interrupt on @domain
+ * @domain: The domain to allocate on
+ * @hwirq: The hardware interrupt number to allocate for
+ * @type: The interrupt type
+ *
+ * This weirdness supports wire to MSI controllers like MBIGEN.
+ *
+ * @hwirq is the hardware interrupt number which is handed in from
+ * irq_create_fwspec_mapping(). As the wire to MSI domain is sparse, but
+ * sized in firmware, the hardware interrupt number cannot be used as MSI
+ * index. For the underlying irq chip the MSI index is irrelevant and
+ * all it needs is the hardware interrupt number.
+ *
+ * To handle this the MSI index is allocated with MSI_ANY_INDEX and the
+ * hardware interrupt number is stored along with the type information in
+ * msi_desc::cookie so the underlying interrupt chip and domain code can
+ * retrieve it.
+ *
+ * Return: The Linux interrupt number (> 0) or an error code
+ */
+int msi_device_domain_alloc_wired(struct irq_domain *domain, unsigned int hwirq,
+ unsigned int type)
+{
+ unsigned int domid = MSI_DEFAULT_DOMAIN;
+ union msi_instance_cookie icookie = { };
+ struct device *dev = domain->dev;
+ struct msi_map map = { };
+
+ if (WARN_ON_ONCE(!dev || domain->bus_token != DOMAIN_BUS_WIRED_TO_MSI))
+ return -EINVAL;
+
+ icookie.value = ((u64)type << 32) | hwirq;
+
+ msi_lock_descs(dev);
+ if (WARN_ON_ONCE(msi_get_device_domain(dev, domid) != domain))
+ map.index = -EINVAL;
+ else
+ map = __msi_domain_alloc_irq_at(dev, domid, MSI_ANY_INDEX, NULL, &icookie);
+ msi_unlock_descs(dev);
+
+ return map.index >= 0 ? map.virq : map.index;
+}
+
static void __msi_domain_free_irqs(struct device *dev, struct irq_domain *domain,
struct msi_ctrl *ctrl)
{
@@ -1665,6 +1709,30 @@ void msi_domain_free_irqs_all(struct device *dev, unsigned int domid)
msi_unlock_descs(dev);
}
+/**
+ * msi_device_domain_free_wired - Free a wired interrupt in @domain
+ * @domain: The domain to free the interrupt on
+ * @virq: The Linux interrupt number to free
+ *
+ * This is the counterpart of msi_device_domain_alloc_wired() for the
+ * weird wired to MSI converting domains.
+ */
+void msi_device_domain_free_wired(struct irq_domain *domain, unsigned int virq)
+{
+ struct msi_desc *desc = irq_get_msi_desc(virq);
+ struct device *dev = domain->dev;
+
+ if (WARN_ON_ONCE(!dev || !desc || domain->bus_token != DOMAIN_BUS_WIRED_TO_MSI))
+ return;
+
+ msi_lock_descs(dev);
+ if (!WARN_ON_ONCE(msi_get_device_domain(dev, MSI_DEFAULT_DOMAIN) != domain)) {
+ msi_domain_free_irqs_range_locked(dev, MSI_DEFAULT_DOMAIN, desc->msi_index,
+ desc->msi_index);
+ }
+ msi_unlock_descs(dev);
+}
+
/**
* msi_get_domain_info - Get the MSI interrupt domain info for @domain
* @domain: The interrupt domain to retrieve data from
--
2.34.1
From: Thomas Gleixner <[email protected]>
Reroute interrupt allocation in irq_create_fwspec_mapping() if the domain
is a MSI device domain. This is required to convert the support for wire
to MSI bridges to per device MSI domains.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
---
kernel/irq/irqdomain.c | 26 ++++++++++++++++++++------
1 file changed, 20 insertions(+), 6 deletions(-)
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 8fee37918195..aeb41655d6de 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -29,6 +29,7 @@ static int irq_domain_alloc_irqs_locked(struct irq_domain *domain, int irq_base,
unsigned int nr_irqs, int node, void *arg,
bool realloc, const struct irq_affinity_desc *affinity);
static void irq_domain_check_hierarchy(struct irq_domain *domain);
+static void irq_domain_free_one_irq(struct irq_domain *domain, unsigned int virq);
struct irqchip_fwid {
struct fwnode_handle fwnode;
@@ -858,8 +859,13 @@ unsigned int irq_create_fwspec_mapping(struct irq_fwspec *fwspec)
}
if (irq_domain_is_hierarchy(domain)) {
- virq = irq_domain_alloc_irqs_locked(domain, -1, 1, NUMA_NO_NODE,
- fwspec, false, NULL);
+ if (irq_domain_is_msi_device(domain)) {
+ mutex_unlock(&domain->root->mutex);
+ virq = msi_device_domain_alloc_wired(domain, hwirq, type);
+ mutex_lock(&domain->root->mutex);
+ } else
+ virq = irq_domain_alloc_irqs_locked(domain, -1, 1, NUMA_NO_NODE,
+ fwspec, false, NULL);
if (virq <= 0) {
virq = 0;
goto out;
@@ -914,7 +920,7 @@ void irq_dispose_mapping(unsigned int virq)
return;
if (irq_domain_is_hierarchy(domain)) {
- irq_domain_free_irqs(virq, 1);
+ irq_domain_free_one_irq(domain, virq);
} else {
irq_domain_disassociate(domain, virq);
irq_free_desc(virq);
@@ -1755,6 +1761,14 @@ void irq_domain_free_irqs(unsigned int virq, unsigned int nr_irqs)
irq_free_descs(virq, nr_irqs);
}
+static void irq_domain_free_one_irq(struct irq_domain *domain, unsigned int virq)
+{
+ if (irq_domain_is_msi_device(domain))
+ msi_device_domain_free_wired(domain, virq);
+ else
+ irq_domain_free_irqs(virq, 1);
+}
+
/**
* irq_domain_alloc_irqs_parent - Allocate interrupts from parent domain
* @domain: Domain below which interrupts must be allocated
@@ -1907,9 +1921,9 @@ static int irq_domain_alloc_irqs_locked(struct irq_domain *domain, int irq_base,
return -EINVAL;
}
-static void irq_domain_check_hierarchy(struct irq_domain *domain)
-{
-}
+static void irq_domain_check_hierarchy(struct irq_domain *domain) { }
+static void irq_domain_free_one_irq(struct irq_domain *domain, unsigned int virq) { }
+
#endif /* CONFIG_IRQ_DOMAIN_HIERARCHY */
#ifdef CONFIG_GENERIC_IRQ_DEBUGFS
--
2.34.1
The PLIC driver does not require very early initialization so let
us convert it into a platform driver.
As part of the conversion, the PLIC probing undergoes the following
changes:
1. Use dev_info(), dev_err() and dev_warn() instead of pr_info(),
pr_err() and pr_warn()
2. Use devm_xyz() APIs wherever applicable
3. PLIC is now probed after CPUs are brought-up so we have to
setup cpuhp state after context handler of all online CPUs
are initialized otherwise we see crash on multi-socket systems
Signed-off-by: Anup Patel <[email protected]>
---
drivers/irqchip/irq-sifive-plic.c | 239 ++++++++++++++++++------------
1 file changed, 148 insertions(+), 91 deletions(-)
diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
index 5b7bc4fd9517..c8f8a8cdcce1 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -3,7 +3,6 @@
* Copyright (C) 2017 SiFive
* Copyright (C) 2018 Christoph Hellwig
*/
-#define pr_fmt(fmt) "plic: " fmt
#include <linux/cpu.h>
#include <linux/interrupt.h>
#include <linux/io.h>
@@ -64,6 +63,7 @@
#define PLIC_QUIRK_EDGE_INTERRUPT 0
struct plic_priv {
+ struct device *dev;
struct cpumask lmask;
struct irq_domain *irqdomain;
void __iomem *regs;
@@ -85,7 +85,6 @@ struct plic_handler {
struct plic_priv *priv;
};
static int plic_parent_irq __ro_after_init;
-static bool plic_cpuhp_setup_done __ro_after_init;
static DEFINE_PER_CPU(struct plic_handler, plic_handlers);
static int plic_irq_set_type(struct irq_data *d, unsigned int type);
@@ -371,7 +370,8 @@ static void plic_handle_irq(struct irq_desc *desc)
int err = generic_handle_domain_irq(handler->priv->irqdomain,
hwirq);
if (unlikely(err))
- pr_warn_ratelimited("can't find mapping for hwirq %lu\n",
+ dev_warn_ratelimited(handler->priv->dev,
+ "can't find mapping for hwirq %lu\n",
hwirq);
}
@@ -406,57 +406,126 @@ static int plic_starting_cpu(unsigned int cpu)
return 0;
}
-static int __init __plic_init(struct device_node *node,
- struct device_node *parent,
- unsigned long plic_quirks)
+static const struct of_device_id plic_match[] = {
+ { .compatible = "sifive,plic-1.0.0" },
+ { .compatible = "riscv,plic0" },
+ { .compatible = "andestech,nceplic100",
+ .data = (const void *)BIT(PLIC_QUIRK_EDGE_INTERRUPT) },
+ { .compatible = "thead,c900-plic",
+ .data = (const void *)BIT(PLIC_QUIRK_EDGE_INTERRUPT) },
+ {}
+};
+
+static int plic_parse_nr_irqs_and_contexts(struct platform_device *pdev,
+ u32 *nr_irqs, u32 *nr_contexts)
{
- int error = 0, nr_contexts, nr_handlers = 0, i;
- u32 nr_irqs;
- struct plic_priv *priv;
+ struct device *dev = &pdev->dev;
+ int rc;
+
+ /*
+ * Currently, only OF fwnode is supported so extend this
+ * function for ACPI support.
+ */
+ if (!is_of_node(dev->fwnode))
+ return -EINVAL;
+
+ rc = of_property_read_u32(to_of_node(dev->fwnode),
+ "riscv,ndev", nr_irqs);
+ if (rc) {
+ dev_err(dev, "riscv,ndev property not available\n");
+ return rc;
+ }
+
+ *nr_contexts = of_irq_count(to_of_node(dev->fwnode));
+ if (WARN_ON(!(*nr_contexts))) {
+ dev_err(dev, "no PLIC context available\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int plic_parse_context_parent_hwirq(struct platform_device *pdev,
+ u32 context, u32 *parent_hwirq,
+ unsigned long *parent_hartid)
+{
+ struct device *dev = &pdev->dev;
+ struct of_phandle_args parent;
+ int rc;
+
+ /*
+ * Currently, only OF fwnode is supported so extend this
+ * function for ACPI support.
+ */
+ if (!is_of_node(dev->fwnode))
+ return -EINVAL;
+
+ rc = of_irq_parse_one(to_of_node(dev->fwnode), context, &parent);
+ if (rc)
+ return rc;
+
+ rc = riscv_of_parent_hartid(parent.np, parent_hartid);
+ if (rc)
+ return rc;
+
+ *parent_hwirq = parent.args[0];
+ return 0;
+}
+
+static int plic_probe(struct platform_device *pdev)
+{
+ int rc, nr_contexts, nr_handlers = 0, i, cpu;
+ unsigned long plic_quirks = 0, hartid;
+ struct device *dev = &pdev->dev;
struct plic_handler *handler;
- unsigned int cpu;
+ u32 nr_irqs, parent_hwirq;
+ struct irq_domain *domain;
+ struct plic_priv *priv;
+ irq_hw_number_t hwirq;
+ struct resource *res;
+ bool cpuhp_setup;
+
+ if (is_of_node(dev->fwnode)) {
+ const struct of_device_id *id;
+
+ id = of_match_node(plic_match, to_of_node(dev->fwnode));
+ if (id)
+ plic_quirks = (unsigned long)id->data;
+ }
- priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+ priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
if (!priv)
return -ENOMEM;
-
+ priv->dev = dev;
priv->plic_quirks = plic_quirks;
- priv->regs = of_iomap(node, 0);
- if (WARN_ON(!priv->regs)) {
- error = -EIO;
- goto out_free_priv;
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (!res) {
+ dev_err(dev, "failed to get MMIO resource\n");
+ return -EINVAL;
+ }
+ priv->regs = devm_ioremap(dev, res->start, resource_size(res));
+ if (!priv->regs) {
+ dev_err(dev, "failed map MMIO registers\n");
+ return -EIO;
}
- error = -EINVAL;
- of_property_read_u32(node, "riscv,ndev", &nr_irqs);
- if (WARN_ON(!nr_irqs))
- goto out_iounmap;
-
+ rc = plic_parse_nr_irqs_and_contexts(pdev, &nr_irqs, &nr_contexts);
+ if (rc) {
+ dev_err(dev, "failed to parse irqs and contexts\n");
+ return rc;
+ }
priv->nr_irqs = nr_irqs;
- priv->prio_save = bitmap_alloc(nr_irqs, GFP_KERNEL);
+ priv->prio_save = devm_bitmap_zalloc(dev, nr_irqs, GFP_KERNEL);
if (!priv->prio_save)
- goto out_free_priority_reg;
-
- nr_contexts = of_irq_count(node);
- if (WARN_ON(!nr_contexts))
- goto out_free_priority_reg;
-
- error = -ENOMEM;
- priv->irqdomain = irq_domain_add_linear(node, nr_irqs + 1,
- &plic_irqdomain_ops, priv);
- if (WARN_ON(!priv->irqdomain))
- goto out_free_priority_reg;
+ return -ENOMEM;
for (i = 0; i < nr_contexts; i++) {
- struct of_phandle_args parent;
- irq_hw_number_t hwirq;
- int cpu;
- unsigned long hartid;
-
- if (of_irq_parse_one(node, i, &parent)) {
- pr_err("failed to parse parent for context %d.\n", i);
+ rc = plic_parse_context_parent_hwirq(pdev, i,
+ &parent_hwirq, &hartid);
+ if (rc) {
+ dev_warn(dev, "hwirq for context%d not found\n", i);
continue;
}
@@ -464,7 +533,7 @@ static int __init __plic_init(struct device_node *node,
* Skip contexts other than external interrupts for our
* privilege level.
*/
- if (parent.args[0] != RV_IRQ_EXT) {
+ if (parent_hwirq != RV_IRQ_EXT) {
/* Disable S-mode enable bits if running in M-mode. */
if (IS_ENABLED(CONFIG_RISCV_M_MODE)) {
void __iomem *enable_base = priv->regs +
@@ -477,21 +546,17 @@ static int __init __plic_init(struct device_node *node,
continue;
}
- error = riscv_of_parent_hartid(parent.np, &hartid);
- if (error < 0) {
- pr_warn("failed to parse hart ID for context %d.\n", i);
- continue;
- }
-
cpu = riscv_hartid_to_cpuid(hartid);
if (cpu < 0) {
- pr_warn("Invalid cpuid for context %d\n", i);
+ dev_warn(dev, "Invalid cpuid for context %d\n", i);
continue;
}
/* Find parent domain and register chained handler */
- if (!plic_parent_irq && irq_find_host(parent.np)) {
- plic_parent_irq = irq_of_parse_and_map(node, i);
+ domain = irq_find_matching_fwnode(riscv_get_intc_hwnode(),
+ DOMAIN_BUS_ANY);
+ if (!plic_parent_irq && domain) {
+ plic_parent_irq = irq_create_mapping(domain, RV_IRQ_EXT);
if (plic_parent_irq)
irq_set_chained_handler(plic_parent_irq,
plic_handle_irq);
@@ -504,7 +569,7 @@ static int __init __plic_init(struct device_node *node,
*/
handler = per_cpu_ptr(&plic_handlers, cpu);
if (handler->present) {
- pr_warn("handler already present for context %d.\n", i);
+ dev_warn(dev, "handler already present for context%d.\n", i);
plic_set_threshold(handler, PLIC_DISABLE_THRESHOLD);
goto done;
}
@@ -518,10 +583,13 @@ static int __init __plic_init(struct device_node *node,
i * CONTEXT_ENABLE_SIZE;
handler->priv = priv;
- handler->enable_save = kcalloc(DIV_ROUND_UP(nr_irqs, 32),
- sizeof(*handler->enable_save), GFP_KERNEL);
+ handler->enable_save = devm_kcalloc(dev,
+ DIV_ROUND_UP(nr_irqs, 32),
+ sizeof(*handler->enable_save),
+ GFP_KERNEL);
if (!handler->enable_save)
- goto out_free_enable_reg;
+ return -ENOMEM;
+
done:
for (hwirq = 1; hwirq <= nr_irqs; hwirq++) {
plic_toggle(handler, hwirq, 0);
@@ -531,52 +599,41 @@ static int __init __plic_init(struct device_node *node,
nr_handlers++;
}
+ priv->irqdomain = irq_domain_create_linear(dev->fwnode, nr_irqs + 1,
+ &plic_irqdomain_ops, priv);
+ if (WARN_ON(!priv->irqdomain))
+ return -ENOMEM;
+
/*
* We can have multiple PLIC instances so setup cpuhp state
- * and register syscore operations only when context handler
- * for current/boot CPU is present.
+ * and register syscore operations only after context handlers
+ * of all online CPUs are initialized.
*/
- handler = this_cpu_ptr(&plic_handlers);
- if (handler->present && !plic_cpuhp_setup_done) {
+ cpuhp_setup = true;
+ for_each_online_cpu(cpu) {
+ handler = per_cpu_ptr(&plic_handlers, cpu);
+ if (!handler->present) {
+ cpuhp_setup = false;
+ break;
+ }
+ }
+ if (cpuhp_setup) {
cpuhp_setup_state(CPUHP_AP_IRQ_SIFIVE_PLIC_STARTING,
"irqchip/sifive/plic:starting",
plic_starting_cpu, plic_dying_cpu);
register_syscore_ops(&plic_irq_syscore_ops);
- plic_cpuhp_setup_done = true;
}
- pr_info("%pOFP: mapped %d interrupts with %d handlers for"
- " %d contexts.\n", node, nr_irqs, nr_handlers, nr_contexts);
+ dev_info(dev, "mapped %d interrupts with %d handlers for"
+ " %d contexts.\n", nr_irqs, nr_handlers, nr_contexts);
return 0;
-
-out_free_enable_reg:
- for_each_cpu(cpu, cpu_present_mask) {
- handler = per_cpu_ptr(&plic_handlers, cpu);
- kfree(handler->enable_save);
- }
-out_free_priority_reg:
- kfree(priv->prio_save);
-out_iounmap:
- iounmap(priv->regs);
-out_free_priv:
- kfree(priv);
- return error;
}
-static int __init plic_init(struct device_node *node,
- struct device_node *parent)
-{
- return __plic_init(node, parent, 0);
-}
-
-IRQCHIP_DECLARE(sifive_plic, "sifive,plic-1.0.0", plic_init);
-IRQCHIP_DECLARE(riscv_plic0, "riscv,plic0", plic_init); /* for legacy systems */
-
-static int __init plic_edge_init(struct device_node *node,
- struct device_node *parent)
-{
- return __plic_init(node, parent, BIT(PLIC_QUIRK_EDGE_INTERRUPT));
-}
-
-IRQCHIP_DECLARE(andestech_nceplic100, "andestech,nceplic100", plic_edge_init);
-IRQCHIP_DECLARE(thead_c900_plic, "thead,c900-plic", plic_edge_init);
+static struct platform_driver plic_driver = {
+ .driver = {
+ .name = "riscv-plic",
+ .of_match_table = plic_match,
+ },
+ .probe = plic_probe,
+};
+builtin_platform_driver(plic_driver);
--
2.34.1
The RISC-V advanced interrupt architecture (AIA) extends the per-HART
local interrupts in following ways:
1. Minimum 64 local interrupts for both RV32 and RV64
2. Ability to process multiple pending local interrupts in same
interrupt handler
3. Priority configuration for each local interrupts
4. Special CSRs to configure/access the per-HART MSI controller
We add support for #1 and #2 described above in the RISC-V intc driver.
Signed-off-by: Anup Patel <[email protected]>
---
drivers/irqchip/irq-riscv-intc.c | 34 ++++++++++++++++++++++++++------
1 file changed, 28 insertions(+), 6 deletions(-)
diff --git a/drivers/irqchip/irq-riscv-intc.c b/drivers/irqchip/irq-riscv-intc.c
index e8d01b14ccdd..bab536bbaf2c 100644
--- a/drivers/irqchip/irq-riscv-intc.c
+++ b/drivers/irqchip/irq-riscv-intc.c
@@ -17,6 +17,7 @@
#include <linux/module.h>
#include <linux/of.h>
#include <linux/smp.h>
+#include <asm/hwcap.h>
static struct irq_domain *intc_domain;
@@ -30,6 +31,15 @@ static asmlinkage void riscv_intc_irq(struct pt_regs *regs)
generic_handle_domain_irq(intc_domain, cause);
}
+static asmlinkage void riscv_intc_aia_irq(struct pt_regs *regs)
+{
+ unsigned long topi;
+
+ while ((topi = csr_read(CSR_TOPI)))
+ generic_handle_domain_irq(intc_domain,
+ topi >> TOPI_IID_SHIFT);
+}
+
/*
* On RISC-V systems local interrupts are masked or unmasked by writing
* the SIE (Supervisor Interrupt Enable) CSR. As CSRs can only be written
@@ -39,12 +49,18 @@ static asmlinkage void riscv_intc_irq(struct pt_regs *regs)
static void riscv_intc_irq_mask(struct irq_data *d)
{
- csr_clear(CSR_IE, BIT(d->hwirq));
+ if (IS_ENABLED(CONFIG_32BIT) && d->hwirq >= BITS_PER_LONG)
+ csr_clear(CSR_IEH, BIT(d->hwirq - BITS_PER_LONG));
+ else
+ csr_clear(CSR_IE, BIT(d->hwirq));
}
static void riscv_intc_irq_unmask(struct irq_data *d)
{
- csr_set(CSR_IE, BIT(d->hwirq));
+ if (IS_ENABLED(CONFIG_32BIT) && d->hwirq >= BITS_PER_LONG)
+ csr_set(CSR_IEH, BIT(d->hwirq - BITS_PER_LONG));
+ else
+ csr_set(CSR_IE, BIT(d->hwirq));
}
static void riscv_intc_irq_eoi(struct irq_data *d)
@@ -115,16 +131,20 @@ static struct fwnode_handle *riscv_intc_hwnode(void)
static int __init riscv_intc_init_common(struct fwnode_handle *fn)
{
- int rc;
+ int rc, nr_irqs = riscv_isa_extension_available(NULL, SxAIA) ?
+ 64 : BITS_PER_LONG;
- intc_domain = irq_domain_create_linear(fn, BITS_PER_LONG,
+ intc_domain = irq_domain_create_linear(fn, nr_irqs,
&riscv_intc_domain_ops, NULL);
if (!intc_domain) {
pr_err("unable to add IRQ domain\n");
return -ENXIO;
}
- rc = set_handle_irq(&riscv_intc_irq);
+ if (riscv_isa_extension_available(NULL, SxAIA))
+ rc = set_handle_irq(&riscv_intc_aia_irq);
+ else
+ rc = set_handle_irq(&riscv_intc_irq);
if (rc) {
pr_err("failed to set irq handler\n");
return rc;
@@ -132,7 +152,9 @@ static int __init riscv_intc_init_common(struct fwnode_handle *fn)
riscv_set_intc_hwnode_fn(riscv_intc_hwnode);
- pr_info("%d local interrupts mapped\n", BITS_PER_LONG);
+ pr_info("%d local interrupts mapped%s\n",
+ nr_irqs, riscv_isa_extension_available(NULL, SxAIA) ?
+ " using AIA" : "");
return 0;
}
--
2.34.1
From: Björn Töpel <[email protected]>
Some (future) users of the irq matrix allocator, do not know the size
of the matrix bitmaps at compile time.
To avoid wasting memory on unnecessary large bitmaps, size the bitmap
at matrix allocation time.
Signed-off-by: Björn Töpel <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
---
arch/x86/include/asm/hw_irq.h | 2 --
kernel/irq/matrix.c | 28 +++++++++++++++++-----------
2 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index b02c3cd3c0f6..edebf1020e04 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -16,8 +16,6 @@
#include <asm/irq_vectors.h>
-#define IRQ_MATRIX_BITS NR_VECTORS
-
#ifndef __ASSEMBLY__
#include <linux/percpu.h>
diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 75d0ae490e29..8f222d1cccec 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -8,8 +8,6 @@
#include <linux/cpu.h>
#include <linux/irq.h>
-#define IRQ_MATRIX_SIZE (BITS_TO_LONGS(IRQ_MATRIX_BITS))
-
struct cpumap {
unsigned int available;
unsigned int allocated;
@@ -17,8 +15,8 @@ struct cpumap {
unsigned int managed_allocated;
bool initialized;
bool online;
- unsigned long alloc_map[IRQ_MATRIX_SIZE];
- unsigned long managed_map[IRQ_MATRIX_SIZE];
+ unsigned long *managed_map;
+ unsigned long alloc_map[];
};
struct irq_matrix {
@@ -32,8 +30,8 @@ struct irq_matrix {
unsigned int total_allocated;
unsigned int online_maps;
struct cpumap __percpu *maps;
- unsigned long scratch_map[IRQ_MATRIX_SIZE];
- unsigned long system_map[IRQ_MATRIX_SIZE];
+ unsigned long *system_map;
+ unsigned long scratch_map[];
};
#define CREATE_TRACE_POINTS
@@ -50,24 +48,32 @@ __init struct irq_matrix *irq_alloc_matrix(unsigned int matrix_bits,
unsigned int alloc_start,
unsigned int alloc_end)
{
+ unsigned int cpu, matrix_size = BITS_TO_LONGS(matrix_bits);
struct irq_matrix *m;
- if (matrix_bits > IRQ_MATRIX_BITS)
- return NULL;
-
- m = kzalloc(sizeof(*m), GFP_KERNEL);
+ m = kzalloc(struct_size(m, scratch_map, matrix_size * 2), GFP_KERNEL);
if (!m)
return NULL;
+ m->system_map = &m->scratch_map[matrix_size];
+
m->matrix_bits = matrix_bits;
m->alloc_start = alloc_start;
m->alloc_end = alloc_end;
m->alloc_size = alloc_end - alloc_start;
- m->maps = alloc_percpu(*m->maps);
+ m->maps = __alloc_percpu(struct_size(m->maps, alloc_map, matrix_size * 2),
+ __alignof__(*m->maps));
if (!m->maps) {
kfree(m);
return NULL;
}
+
+ for_each_possible_cpu(cpu) {
+ struct cpumap *cm = per_cpu_ptr(m->maps, cpu);
+
+ cm->managed_map = &cm->alloc_map[matrix_size];
+ }
+
return m;
}
--
2.34.1
The RISC-V advanced interrupt architecture (AIA) specification
defines a new MSI controller called incoming message signalled
interrupt controller (IMSIC) which manages MSI on per-HART (or
per-CPU) basis. It also supports IPIs as software injected MSIs.
(For more details refer https://github.com/riscv/riscv-aia)
Let us add an early irqchip driver for RISC-V IMSIC which sets
up the IMSIC state and provide IPIs.
Signed-off-by: Anup Patel <[email protected]>
---
drivers/irqchip/Kconfig | 7 +
drivers/irqchip/Makefile | 1 +
drivers/irqchip/irq-riscv-imsic-early.c | 241 +++++++
drivers/irqchip/irq-riscv-imsic-state.c | 887 ++++++++++++++++++++++++
drivers/irqchip/irq-riscv-imsic-state.h | 105 +++
include/linux/irqchip/riscv-imsic.h | 87 +++
6 files changed, 1328 insertions(+)
create mode 100644 drivers/irqchip/irq-riscv-imsic-early.c
create mode 100644 drivers/irqchip/irq-riscv-imsic-state.c
create mode 100644 drivers/irqchip/irq-riscv-imsic-state.h
create mode 100644 include/linux/irqchip/riscv-imsic.h
diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index f7149d0f3d45..85f86e31c996 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -546,6 +546,13 @@ config SIFIVE_PLIC
select IRQ_DOMAIN_HIERARCHY
select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP
+config RISCV_IMSIC
+ bool
+ depends on RISCV
+ select IRQ_DOMAIN_HIERARCHY
+ select GENERIC_IRQ_MATRIX_ALLOCATOR
+ select GENERIC_MSI_IRQ
+
config EXYNOS_IRQ_COMBINER
bool "Samsung Exynos IRQ combiner support" if COMPILE_TEST
depends on (ARCH_EXYNOS && ARM) || COMPILE_TEST
diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
index ffd945fe71aa..d714724387ce 100644
--- a/drivers/irqchip/Makefile
+++ b/drivers/irqchip/Makefile
@@ -95,6 +95,7 @@ obj-$(CONFIG_QCOM_MPM) += irq-qcom-mpm.o
obj-$(CONFIG_CSKY_MPINTC) += irq-csky-mpintc.o
obj-$(CONFIG_CSKY_APB_INTC) += irq-csky-apb-intc.o
obj-$(CONFIG_RISCV_INTC) += irq-riscv-intc.o
+obj-$(CONFIG_RISCV_IMSIC) += irq-riscv-imsic-state.o irq-riscv-imsic-early.o
obj-$(CONFIG_SIFIVE_PLIC) += irq-sifive-plic.o
obj-$(CONFIG_IMX_IRQSTEER) += irq-imx-irqsteer.o
obj-$(CONFIG_IMX_INTMUX) += irq-imx-intmux.o
diff --git a/drivers/irqchip/irq-riscv-imsic-early.c b/drivers/irqchip/irq-riscv-imsic-early.c
new file mode 100644
index 000000000000..3557e32a713c
--- /dev/null
+++ b/drivers/irqchip/irq-riscv-imsic-early.c
@@ -0,0 +1,241 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2021 Western Digital Corporation or its affiliates.
+ * Copyright (C) 2022 Ventana Micro Systems Inc.
+ */
+
+#define pr_fmt(fmt) "riscv-imsic: " fmt
+#include <linux/cpu.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/irq.h>
+#include <linux/irqchip.h>
+#include <linux/irqchip/chained_irq.h>
+#include <linux/module.h>
+#include <linux/spinlock.h>
+#include <linux/smp.h>
+
+#include "irq-riscv-imsic-state.h"
+
+static int imsic_parent_irq;
+
+#ifdef CONFIG_SMP
+static irqreturn_t imsic_local_sync_handler(int irq, void *data)
+{
+ imsic_local_sync();
+ return IRQ_HANDLED;
+}
+
+static void imsic_ipi_send(unsigned int cpu)
+{
+ struct imsic_local_config *local =
+ per_cpu_ptr(imsic->global.local, cpu);
+
+ writel_relaxed(IMSIC_IPI_ID, local->msi_va);
+}
+
+static void imsic_ipi_starting_cpu(void)
+{
+ /* Enable IPIs for current CPU. */
+ __imsic_id_set_enable(IMSIC_IPI_ID);
+
+ /* Enable virtual IPI used for IMSIC ID synchronization */
+ enable_percpu_irq(imsic->ipi_virq, 0);
+}
+
+static void imsic_ipi_dying_cpu(void)
+{
+ /*
+ * Disable virtual IPI used for IMSIC ID synchronization so
+ * that we don't receive ID synchronization requests.
+ */
+ disable_percpu_irq(imsic->ipi_virq);
+}
+
+static int __init imsic_ipi_domain_init(void)
+{
+ int virq;
+
+ /* Create IMSIC IPI multiplexing */
+ virq = ipi_mux_create(IMSIC_NR_IPI, imsic_ipi_send);
+ if (virq <= 0)
+ return (virq < 0) ? virq : -ENOMEM;
+ imsic->ipi_virq = virq;
+
+ /* First vIRQ is used for IMSIC ID synchronization */
+ virq = request_percpu_irq(imsic->ipi_virq, imsic_local_sync_handler,
+ "riscv-imsic-lsync", imsic->global.local);
+ if (virq)
+ return virq;
+ irq_set_status_flags(imsic->ipi_virq, IRQ_HIDDEN);
+ imsic->ipi_lsync_desc = irq_to_desc(imsic->ipi_virq);
+
+ /* Set vIRQ range */
+ riscv_ipi_set_virq_range(imsic->ipi_virq + 1, IMSIC_NR_IPI - 1, true);
+
+ /* Announce that IMSIC is providing IPIs */
+ pr_info("%pfwP: providing IPIs using interrupt %d\n",
+ imsic->fwnode, IMSIC_IPI_ID);
+
+ return 0;
+}
+#else
+static void imsic_ipi_starting_cpu(void)
+{
+}
+
+static void imsic_ipi_dying_cpu(void)
+{
+}
+
+static int __init imsic_ipi_domain_init(void)
+{
+ return 0;
+}
+#endif
+
+/*
+ * To handle an interrupt, we read the TOPEI CSR and write zero in one
+ * instruction. If TOPEI CSR is non-zero then we translate TOPEI.ID to
+ * Linux interrupt number and let Linux IRQ subsystem handle it.
+ */
+static void imsic_handle_irq(struct irq_desc *desc)
+{
+ struct irq_chip *chip = irq_desc_get_chip(desc);
+ int err, cpu = smp_processor_id();
+ struct imsic_vector *vec;
+ unsigned long local_id;
+
+ chained_irq_enter(chip, desc);
+
+ while ((local_id = csr_swap(CSR_TOPEI, 0))) {
+ local_id = local_id >> TOPEI_ID_SHIFT;
+
+ if (local_id == IMSIC_IPI_ID) {
+#ifdef CONFIG_SMP
+ ipi_mux_process();
+#endif
+ continue;
+ }
+
+ if (unlikely(!imsic->base_domain))
+ continue;
+
+ vec = imsic_vector_from_local_id(cpu, local_id);
+ if (!vec) {
+ pr_warn_ratelimited(
+ "vector not found for local ID 0x%lx\n",
+ local_id);
+ continue;
+ }
+
+ err = generic_handle_domain_irq(imsic->base_domain,
+ vec->hwirq);
+ if (unlikely(err))
+ pr_warn_ratelimited(
+ "hwirq 0x%x mapping not found\n",
+ vec->hwirq);
+ }
+
+ chained_irq_exit(chip, desc);
+}
+
+static int imsic_starting_cpu(unsigned int cpu)
+{
+ /* Mark per-CPU IMSIC state as online */
+ imsic_state_online();
+
+ /* Enable per-CPU parent interrupt */
+ enable_percpu_irq(imsic_parent_irq,
+ irq_get_trigger_type(imsic_parent_irq));
+
+ /* Setup IPIs */
+ imsic_ipi_starting_cpu();
+
+ /*
+ * Interrupts identities might have been enabled/disabled while
+ * this CPU was not running so sync-up local enable/disable state.
+ */
+ imsic_local_sync();
+
+ /* Enable local interrupt delivery */
+ imsic_local_delivery(true);
+
+ return 0;
+}
+
+static int imsic_dying_cpu(unsigned int cpu)
+{
+ /* Cleanup IPIs */
+ imsic_ipi_dying_cpu();
+
+ /* Mark per-CPU IMSIC state as offline */
+ imsic_state_offline();
+
+ return 0;
+}
+
+static int __init imsic_early_probe(struct fwnode_handle *fwnode)
+{
+ int rc;
+ struct irq_domain *domain;
+
+ /* Find parent domain and register chained handler */
+ domain = irq_find_matching_fwnode(riscv_get_intc_hwnode(),
+ DOMAIN_BUS_ANY);
+ if (!domain) {
+ pr_err("%pfwP: Failed to find INTC domain\n", fwnode);
+ return -ENOENT;
+ }
+ imsic_parent_irq = irq_create_mapping(domain, RV_IRQ_EXT);
+ if (!imsic_parent_irq) {
+ pr_err("%pfwP: Failed to create INTC mapping\n", fwnode);
+ return -ENOENT;
+ }
+ irq_set_chained_handler(imsic_parent_irq, imsic_handle_irq);
+
+ /* Initialize IPI domain */
+ rc = imsic_ipi_domain_init();
+ if (rc) {
+ pr_err("%pfwP: Failed to initialize IPI domain\n", fwnode);
+ return rc;
+ }
+
+ /*
+ * Setup cpuhp state (must be done after setting imsic_parent_irq)
+ *
+ * Don't disable per-CPU IMSIC file when CPU goes offline
+ * because this affects IPI and the masking/unmasking of
+ * virtual IPIs is done via generic IPI-Mux
+ */
+ cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
+ "irqchip/riscv/imsic:starting",
+ imsic_starting_cpu, imsic_dying_cpu);
+
+ return 0;
+}
+
+static int __init imsic_early_dt_init(struct device_node *node,
+ struct device_node *parent)
+{
+ int rc;
+ struct fwnode_handle *fwnode = &node->fwnode;
+
+ /* Setup IMSIC state */
+ rc = imsic_setup_state(fwnode);
+ if (rc) {
+ pr_err("%pfwP: failed to setup state (error %d)\n",
+ fwnode, rc);
+ return rc;
+ }
+
+ /* Do early setup of IPIs */
+ rc = imsic_early_probe(fwnode);
+ if (rc)
+ return rc;
+
+ /* Ensure that OF platform device gets probed */
+ of_node_clear_flag(node, OF_POPULATED);
+ return 0;
+}
+IRQCHIP_DECLARE(riscv_imsic, "riscv,imsics", imsic_early_dt_init);
diff --git a/drivers/irqchip/irq-riscv-imsic-state.c b/drivers/irqchip/irq-riscv-imsic-state.c
new file mode 100644
index 000000000000..66389a6e558f
--- /dev/null
+++ b/drivers/irqchip/irq-riscv-imsic-state.c
@@ -0,0 +1,887 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2021 Western Digital Corporation or its affiliates.
+ * Copyright (C) 2022 Ventana Micro Systems Inc.
+ */
+
+#define pr_fmt(fmt) "riscv-imsic: " fmt
+#include <linux/cpu.h>
+#include <linux/bitmap.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/of_irq.h>
+#include <linux/seq_file.h>
+#include <linux/spinlock.h>
+#include <linux/smp.h>
+#include <asm/hwcap.h>
+
+#include "irq-riscv-imsic-state.h"
+
+#define IMSIC_DISABLE_EIDELIVERY 0
+#define IMSIC_ENABLE_EIDELIVERY 1
+#define IMSIC_DISABLE_EITHRESHOLD 1
+#define IMSIC_ENABLE_EITHRESHOLD 0
+
+#define imsic_csr_write(__c, __v) \
+do { \
+ csr_write(CSR_ISELECT, __c); \
+ csr_write(CSR_IREG, __v); \
+} while (0)
+
+#define imsic_csr_read(__c) \
+({ \
+ unsigned long __v; \
+ csr_write(CSR_ISELECT, __c); \
+ __v = csr_read(CSR_IREG); \
+ __v; \
+})
+
+#define imsic_csr_read_clear(__c, __v) \
+({ \
+ unsigned long __r; \
+ csr_write(CSR_ISELECT, __c); \
+ __r = csr_read_clear(CSR_IREG, __v); \
+ __r; \
+})
+
+#define imsic_csr_set(__c, __v) \
+do { \
+ csr_write(CSR_ISELECT, __c); \
+ csr_set(CSR_IREG, __v); \
+} while (0)
+
+#define imsic_csr_clear(__c, __v) \
+do { \
+ csr_write(CSR_ISELECT, __c); \
+ csr_clear(CSR_IREG, __v); \
+} while (0)
+
+struct imsic_priv *imsic;
+
+const struct imsic_global_config *imsic_get_global_config(void)
+{
+ return imsic ? &imsic->global : NULL;
+}
+EXPORT_SYMBOL_GPL(imsic_get_global_config);
+
+static bool __imsic_eix_read_clear(unsigned long id, bool pend)
+{
+ unsigned long isel, imask;
+
+ isel = id / BITS_PER_LONG;
+ isel *= BITS_PER_LONG / IMSIC_EIPx_BITS;
+ isel += pend ? IMSIC_EIP0 : IMSIC_EIE0;
+ imask = BIT(id & (__riscv_xlen - 1));
+
+ return (imsic_csr_read_clear(isel, imask) & imask) ? true : false;
+}
+
+#define __imsic_id_read_clear_enabled(__id) \
+ __imsic_eix_read_clear((__id), false)
+#define __imsic_id_read_clear_pending(__id) \
+ __imsic_eix_read_clear((__id), true)
+
+void __imsic_eix_update(unsigned long base_id,
+ unsigned long num_id, bool pend, bool val)
+{
+ unsigned long i, isel, ireg;
+ unsigned long id = base_id, last_id = base_id + num_id;
+
+ while (id < last_id) {
+ isel = id / BITS_PER_LONG;
+ isel *= BITS_PER_LONG / IMSIC_EIPx_BITS;
+ isel += (pend) ? IMSIC_EIP0 : IMSIC_EIE0;
+
+ ireg = 0;
+ for (i = id & (__riscv_xlen - 1);
+ (id < last_id) && (i < __riscv_xlen); i++) {
+ ireg |= BIT(i);
+ id++;
+ }
+
+ /*
+ * The IMSIC EIEx and EIPx registers are indirectly
+ * accessed via using ISELECT and IREG CSRs so we
+ * need to access these CSRs without getting preempted.
+ *
+ * All existing users of this function call this
+ * function with local IRQs disabled so we don't
+ * need to do anything special here.
+ */
+ if (val)
+ imsic_csr_set(isel, ireg);
+ else
+ imsic_csr_clear(isel, ireg);
+ }
+}
+
+void imsic_local_sync(void)
+{
+ struct imsic_local_priv *lpriv = this_cpu_ptr(imsic->lpriv);
+ struct imsic_local_config *mlocal;
+ struct imsic_vector *mvec;
+ unsigned long flags;
+ int i;
+
+ raw_spin_lock_irqsave(&lpriv->ids_lock, flags);
+ for (i = 1; i <= imsic->global.nr_ids; i++) {
+ if (i == IMSIC_IPI_ID)
+ continue;
+
+ if (test_bit(i, lpriv->ids_enabled_bitmap))
+ __imsic_id_set_enable(i);
+ else
+ __imsic_id_clear_enable(i);
+
+ mvec = lpriv->ids_move[i];
+ lpriv->ids_move[i] = NULL;
+ if (mvec) {
+ if (__imsic_id_read_clear_pending(i)) {
+ mlocal = per_cpu_ptr(imsic->global.local,
+ mvec->cpu);
+ writel_relaxed(mvec->local_id, mlocal->msi_va);
+ }
+
+ imsic_vector_free(&lpriv->vectors[i]);
+ }
+
+ }
+ raw_spin_unlock_irqrestore(&lpriv->ids_lock, flags);
+}
+
+void imsic_local_delivery(bool enable)
+{
+ if (enable) {
+ imsic_csr_write(IMSIC_EITHRESHOLD, IMSIC_ENABLE_EITHRESHOLD);
+ imsic_csr_write(IMSIC_EIDELIVERY, IMSIC_ENABLE_EIDELIVERY);
+ return;
+ }
+
+ imsic_csr_write(IMSIC_EIDELIVERY, IMSIC_DISABLE_EIDELIVERY);
+ imsic_csr_write(IMSIC_EITHRESHOLD, IMSIC_DISABLE_EITHRESHOLD);
+}
+
+#ifdef CONFIG_SMP
+static void imsic_remote_sync(unsigned int cpu)
+{
+ /*
+ * We simply inject ID synchronization IPI to a target CPU
+ * if it is not same as the current CPU. The ipi_send_mask()
+ * implementation of IPI mux will inject ID synchronization
+ * IPI only for CPUs that have enabled it so offline CPUs
+ * won't receive IPI. An offline CPU will unconditionally
+ * synchronize IDs through imsic_starting_cpu() when the
+ * CPU is brought up.
+ */
+ if (cpu_online(cpu)) {
+ if (cpu != smp_processor_id())
+ __ipi_send_mask(imsic->ipi_lsync_desc, cpumask_of(cpu));
+ else
+ imsic_local_sync();
+ }
+}
+#else
+static inline void imsic_remote_sync(unsigned int cpu)
+{
+ imsic_local_sync();
+}
+#endif
+
+void imsic_vector_mask(struct imsic_vector *vec)
+{
+ struct imsic_local_priv *lpriv;
+ unsigned long flags;
+
+ lpriv = per_cpu_ptr(imsic->lpriv, vec->cpu);
+ if (WARN_ON(&lpriv->vectors[vec->local_id] != vec))
+ return;
+
+ raw_spin_lock_irqsave(&lpriv->ids_lock, flags);
+ bitmap_clear(lpriv->ids_enabled_bitmap, vec->local_id, 1);
+ raw_spin_unlock_irqrestore(&lpriv->ids_lock, flags);
+
+ imsic_remote_sync(vec->cpu);
+}
+
+void imsic_vector_unmask(struct imsic_vector *vec)
+{
+ struct imsic_local_priv *lpriv;
+ unsigned long flags;
+
+ lpriv = per_cpu_ptr(imsic->lpriv, vec->cpu);
+ if (WARN_ON(&lpriv->vectors[vec->local_id] != vec))
+ return;
+
+ raw_spin_lock_irqsave(&lpriv->ids_lock, flags);
+ bitmap_set(lpriv->ids_enabled_bitmap, vec->local_id, 1);
+ raw_spin_unlock_irqrestore(&lpriv->ids_lock, flags);
+
+ imsic_remote_sync(vec->cpu);
+}
+
+void imsic_vector_move(struct imsic_vector *old_vec,
+ struct imsic_vector *new_vec)
+{
+ struct imsic_local_priv *old_lpriv, *new_lpriv;
+ unsigned long flags, flags1;
+
+ if (WARN_ON(old_vec->cpu == new_vec->cpu))
+ return;
+
+ old_lpriv = per_cpu_ptr(imsic->lpriv, old_vec->cpu);
+ if (WARN_ON(&old_lpriv->vectors[old_vec->local_id] != old_vec))
+ return;
+
+ new_lpriv = per_cpu_ptr(imsic->lpriv, new_vec->cpu);
+ if (WARN_ON(&new_lpriv->vectors[new_vec->local_id] != new_vec))
+ return;
+
+ raw_spin_lock_irqsave(&old_lpriv->ids_lock, flags);
+ raw_spin_lock_irqsave(&new_lpriv->ids_lock, flags1);
+
+ /* Unmask the new vector entry */
+ if (test_bit(old_vec->local_id, old_lpriv->ids_enabled_bitmap))
+ bitmap_set(new_lpriv->ids_enabled_bitmap,
+ new_vec->local_id, 1);
+
+ /* Mask the old vector entry */
+ bitmap_clear(old_lpriv->ids_enabled_bitmap, old_vec->local_id, 1);
+
+ /*
+ * Move and re-trigger the new vector based on the pending
+ * state of the old vector because we might get a device
+ * interrupt on the old vector while device was being moved
+ * to the new vector.
+ */
+ old_lpriv->ids_move[old_vec->local_id] = new_vec;
+
+ raw_spin_unlock_irqrestore(&new_lpriv->ids_lock, flags1);
+ raw_spin_unlock_irqrestore(&old_lpriv->ids_lock, flags);
+
+ imsic_remote_sync(old_vec->cpu);
+ imsic_remote_sync(new_vec->cpu);
+}
+
+#ifdef CONFIG_GENERIC_IRQ_DEBUGFS
+void imsic_vector_debug_show(struct seq_file *m,
+ struct imsic_vector *vec, int ind)
+{
+ unsigned int mcpu = 0, mlocal_id = 0;
+ struct imsic_local_priv *lpriv;
+ bool move_in_progress = false;
+ struct imsic_vector *mvec;
+ bool is_enabled = false;
+ unsigned long flags;
+
+ lpriv = per_cpu_ptr(imsic->lpriv, vec->cpu);
+ if (WARN_ON(&lpriv->vectors[vec->local_id] != vec))
+ return;
+
+ raw_spin_lock_irqsave(&lpriv->ids_lock, flags);
+ if (test_bit(vec->local_id, lpriv->ids_enabled_bitmap))
+ is_enabled = true;
+ mvec = lpriv->ids_move[vec->local_id];
+ if (mvec) {
+ move_in_progress = true;
+ mcpu = mvec->cpu;
+ mlocal_id = mvec->local_id;
+ }
+ raw_spin_unlock_irqrestore(&lpriv->ids_lock, flags);
+
+ seq_printf(m, "%*starget_cpu : %5u\n", ind, "", vec->cpu);
+ seq_printf(m, "%*starget_local_id : %5u\n", ind, "", vec->local_id);
+ seq_printf(m, "%*sis_reserved : %5u\n", ind, "",
+ (vec->local_id <= IMSIC_IPI_ID) ? 1 : 0);
+ seq_printf(m, "%*sis_enabled : %5u\n", ind, "",
+ (is_enabled) ? 1 : 0);
+ seq_printf(m, "%*sis_move_pending : %5u\n", ind, "",
+ (move_in_progress) ? 1 : 0);
+ if (move_in_progress) {
+ seq_printf(m, "%*smove_cpu : %5u\n", ind, "", mcpu);
+ seq_printf(m, "%*smove_local_id : %5u\n", ind, "", mlocal_id);
+ }
+}
+
+void imsic_vector_debug_show_summary(struct seq_file *m, int ind)
+{
+ irq_matrix_debug_show(m, imsic->matrix, ind);
+}
+#endif
+
+struct imsic_vector *imsic_vector_from_local_id(unsigned int cpu,
+ unsigned int local_id)
+{
+ struct imsic_local_priv *lpriv = per_cpu_ptr(imsic->lpriv, cpu);
+
+ if (!lpriv || imsic->global.nr_ids < local_id)
+ return NULL;
+
+ return &lpriv->vectors[local_id];
+}
+
+struct imsic_vector *imsic_vector_alloc(unsigned int hwirq,
+ const struct cpumask *mask)
+{
+ struct imsic_vector *vec = NULL;
+ struct imsic_local_priv *lpriv;
+ unsigned long flags;
+ unsigned int cpu;
+ int local_id;
+
+ raw_spin_lock_irqsave(&imsic->matrix_lock, flags);
+ local_id = irq_matrix_alloc(imsic->matrix, mask, false, &cpu);
+ raw_spin_unlock_irqrestore(&imsic->matrix_lock, flags);
+ if (local_id < 0)
+ return NULL;
+
+ lpriv = per_cpu_ptr(imsic->lpriv, cpu);
+ vec = &lpriv->vectors[local_id];
+ vec->hwirq = hwirq;
+
+ return vec;
+}
+
+void imsic_vector_free(struct imsic_vector *vec)
+{
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&imsic->matrix_lock, flags);
+ vec->hwirq = UINT_MAX;
+ irq_matrix_free(imsic->matrix, vec->cpu, vec->local_id, false);
+ raw_spin_unlock_irqrestore(&imsic->matrix_lock, flags);
+}
+
+static void __init imsic_local_cleanup(void)
+{
+ int cpu;
+ struct imsic_local_priv *lpriv;
+
+ for_each_possible_cpu(cpu) {
+ lpriv = per_cpu_ptr(imsic->lpriv, cpu);
+
+ bitmap_free(lpriv->ids_enabled_bitmap);
+ kfree(lpriv->ids_move);
+ kfree(lpriv->vectors);
+ }
+
+ free_percpu(imsic->lpriv);
+}
+
+static int __init imsic_local_init(void)
+{
+ struct imsic_global_config *global = &imsic->global;
+ struct imsic_local_priv *lpriv;
+ struct imsic_vector *vec;
+ int cpu, i;
+
+ /* Allocate per-CPU private state */
+ imsic->lpriv = alloc_percpu(typeof(*(imsic->lpriv)));
+ if (!imsic->lpriv)
+ return -ENOMEM;
+
+ /* Setup per-CPU private state */
+ for_each_possible_cpu(cpu) {
+ lpriv = per_cpu_ptr(imsic->lpriv, cpu);
+
+ raw_spin_lock_init(&lpriv->ids_lock);
+
+ /* Allocate enabled bitmap */
+ lpriv->ids_enabled_bitmap = bitmap_zalloc(global->nr_ids + 1,
+ GFP_KERNEL);
+ if (!lpriv->ids_enabled_bitmap) {
+ imsic_local_cleanup();
+ return -ENOMEM;
+ }
+
+ /* Allocate move array */
+ lpriv->ids_move = kcalloc(global->nr_ids + 1,
+ sizeof(*lpriv->ids_move), GFP_KERNEL);
+ if (!lpriv->ids_move) {
+ imsic_local_cleanup();
+ return -ENOMEM;
+ }
+
+ /* Allocate vector array */
+ lpriv->vectors = kcalloc(global->nr_ids + 1,
+ sizeof(*lpriv->vectors), GFP_KERNEL);
+ if (!lpriv->vectors) {
+ imsic_local_cleanup();
+ return -ENOMEM;
+ }
+
+ /* Setup vector array */
+ for (i = 0; i <= global->nr_ids; i++) {
+ vec = &lpriv->vectors[i];
+ vec->cpu = cpu;
+ vec->local_id = i;
+ vec->hwirq = UINT_MAX;
+ }
+ }
+
+ return 0;
+}
+
+int imsic_hwirq_alloc(void)
+{
+ int ret;
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&imsic->hwirqs_lock, flags);
+ ret = bitmap_find_free_region(imsic->hwirqs_used_bitmap,
+ imsic->nr_hwirqs, 0);
+ raw_spin_unlock_irqrestore(&imsic->hwirqs_lock, flags);
+
+ return ret;
+}
+
+void imsic_hwirq_free(unsigned int hwirq)
+{
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&imsic->hwirqs_lock, flags);
+ bitmap_release_region(imsic->hwirqs_used_bitmap, hwirq, 0);
+ raw_spin_unlock_irqrestore(&imsic->hwirqs_lock, flags);
+}
+
+static int __init imsic_hwirqs_init(void)
+{
+ struct imsic_global_config *global = &imsic->global;
+
+ imsic->nr_hwirqs = num_possible_cpus() * (global->nr_ids - 1);
+
+ raw_spin_lock_init(&imsic->hwirqs_lock);
+
+ imsic->hwirqs_used_bitmap = bitmap_zalloc(imsic->nr_hwirqs,
+ GFP_KERNEL);
+ if (!imsic->hwirqs_used_bitmap)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static void __init imsic_hwirqs_cleanup(void)
+{
+ bitmap_free(imsic->hwirqs_used_bitmap);
+}
+
+void imsic_state_online(void)
+{
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&imsic->matrix_lock, flags);
+ irq_matrix_online(imsic->matrix);
+ raw_spin_unlock_irqrestore(&imsic->matrix_lock, flags);
+}
+
+void imsic_state_offline(void)
+{
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&imsic->matrix_lock, flags);
+ irq_matrix_offline(imsic->matrix);
+ raw_spin_unlock_irqrestore(&imsic->matrix_lock, flags);
+}
+
+static int __init imsic_matrix_init(void)
+{
+ struct imsic_global_config *global = &imsic->global;
+
+ raw_spin_lock_init(&imsic->matrix_lock);
+ imsic->matrix = irq_alloc_matrix(global->nr_ids + 1,
+ 0, global->nr_ids + 1);
+ if (!imsic->matrix)
+ return -ENOMEM;
+
+ /* Reserve ID#0 because it is special and never implemented */
+ irq_matrix_assign_system(imsic->matrix, 0, false);
+
+ /* Reserve IPI ID because it is special and used internally */
+ irq_matrix_assign_system(imsic->matrix, IMSIC_IPI_ID, false);
+
+ return 0;
+}
+
+static int __init imsic_get_parent_hartid(struct fwnode_handle *fwnode,
+ u32 index, unsigned long *hartid)
+{
+ int rc;
+ struct of_phandle_args parent;
+
+ /*
+ * Currently, only OF fwnode is supported so extend this
+ * function for ACPI support.
+ */
+ if (!is_of_node(fwnode))
+ return -EINVAL;
+
+ rc = of_irq_parse_one(to_of_node(fwnode), index, &parent);
+ if (rc)
+ return rc;
+
+ /*
+ * Skip interrupts other than external interrupts for
+ * current privilege level.
+ */
+ if (parent.args[0] != RV_IRQ_EXT)
+ return -EINVAL;
+
+ return riscv_of_parent_hartid(parent.np, hartid);
+}
+
+static int __init imsic_get_mmio_resource(struct fwnode_handle *fwnode,
+ u32 index, struct resource *res)
+{
+ /*
+ * Currently, only OF fwnode is supported so extend this
+ * function for ACPI support.
+ */
+ if (!is_of_node(fwnode))
+ return -EINVAL;
+
+ return of_address_to_resource(to_of_node(fwnode), index, res);
+}
+
+static int __init imsic_parse_fwnode(struct fwnode_handle *fwnode,
+ struct imsic_global_config *global,
+ u32 *nr_parent_irqs,
+ u32 *nr_mmios)
+{
+ unsigned long hartid;
+ struct resource res;
+ int rc;
+ u32 i;
+
+ /*
+ * Currently, only OF fwnode is supported so extend this
+ * function for ACPI support.
+ */
+ if (!is_of_node(fwnode))
+ return -EINVAL;
+
+ *nr_parent_irqs = 0;
+ *nr_mmios = 0;
+
+ /* Find number of parent interrupts */
+ *nr_parent_irqs = 0;
+ while (!imsic_get_parent_hartid(fwnode, *nr_parent_irqs, &hartid))
+ (*nr_parent_irqs)++;
+ if (!(*nr_parent_irqs)) {
+ pr_err("%pfwP: no parent irqs available\n", fwnode);
+ return -EINVAL;
+ }
+
+ /* Find number of guest index bits in MSI address */
+ rc = of_property_read_u32(to_of_node(fwnode),
+ "riscv,guest-index-bits",
+ &global->guest_index_bits);
+ if (rc)
+ global->guest_index_bits = 0;
+
+ /* Find number of HART index bits */
+ rc = of_property_read_u32(to_of_node(fwnode),
+ "riscv,hart-index-bits",
+ &global->hart_index_bits);
+ if (rc) {
+ /* Assume default value */
+ global->hart_index_bits = __fls(*nr_parent_irqs);
+ if (BIT(global->hart_index_bits) < *nr_parent_irqs)
+ global->hart_index_bits++;
+ }
+
+ /* Find number of group index bits */
+ rc = of_property_read_u32(to_of_node(fwnode),
+ "riscv,group-index-bits",
+ &global->group_index_bits);
+ if (rc)
+ global->group_index_bits = 0;
+
+ /*
+ * Find first bit position of group index.
+ * If not specified assumed the default APLIC-IMSIC configuration.
+ */
+ rc = of_property_read_u32(to_of_node(fwnode),
+ "riscv,group-index-shift",
+ &global->group_index_shift);
+ if (rc)
+ global->group_index_shift = IMSIC_MMIO_PAGE_SHIFT * 2;
+
+ /* Find number of interrupt identities */
+ rc = of_property_read_u32(to_of_node(fwnode),
+ "riscv,num-ids",
+ &global->nr_ids);
+ if (rc) {
+ pr_err("%pfwP: number of interrupt identities not found\n",
+ fwnode);
+ return rc;
+ }
+
+ /* Find number of guest interrupt identities */
+ rc = of_property_read_u32(to_of_node(fwnode),
+ "riscv,num-guest-ids",
+ &global->nr_guest_ids);
+ if (rc)
+ global->nr_guest_ids = global->nr_ids;
+
+ /* Sanity check guest index bits */
+ i = BITS_PER_LONG - IMSIC_MMIO_PAGE_SHIFT;
+ if (i < global->guest_index_bits) {
+ pr_err("%pfwP: guest index bits too big\n", fwnode);
+ return -EINVAL;
+ }
+
+ /* Sanity check HART index bits */
+ i = BITS_PER_LONG - IMSIC_MMIO_PAGE_SHIFT - global->guest_index_bits;
+ if (i < global->hart_index_bits) {
+ pr_err("%pfwP: HART index bits too big\n", fwnode);
+ return -EINVAL;
+ }
+
+ /* Sanity check group index bits */
+ i = BITS_PER_LONG - IMSIC_MMIO_PAGE_SHIFT -
+ global->guest_index_bits - global->hart_index_bits;
+ if (i < global->group_index_bits) {
+ pr_err("%pfwP: group index bits too big\n", fwnode);
+ return -EINVAL;
+ }
+
+ /* Sanity check group index shift */
+ i = global->group_index_bits + global->group_index_shift - 1;
+ if (i >= BITS_PER_LONG) {
+ pr_err("%pfwP: group index shift too big\n", fwnode);
+ return -EINVAL;
+ }
+
+ /* Sanity check number of interrupt identities */
+ if ((global->nr_ids < IMSIC_MIN_ID) ||
+ (global->nr_ids >= IMSIC_MAX_ID) ||
+ ((global->nr_ids & IMSIC_MIN_ID) != IMSIC_MIN_ID)) {
+ pr_err("%pfwP: invalid number of interrupt identities\n",
+ fwnode);
+ return -EINVAL;
+ }
+
+ /* Sanity check number of guest interrupt identities */
+ if ((global->nr_guest_ids < IMSIC_MIN_ID) ||
+ (global->nr_guest_ids >= IMSIC_MAX_ID) ||
+ ((global->nr_guest_ids & IMSIC_MIN_ID) != IMSIC_MIN_ID)) {
+ pr_err("%pfwP: invalid number of guest interrupt identities\n",
+ fwnode);
+ return -EINVAL;
+ }
+
+ /* Compute base address */
+ rc = imsic_get_mmio_resource(fwnode, 0, &res);
+ if (rc) {
+ pr_err("%pfwP: first MMIO resource not found\n", fwnode);
+ return -EINVAL;
+ }
+ global->base_addr = res.start;
+ global->base_addr &= ~(BIT(global->guest_index_bits +
+ global->hart_index_bits +
+ IMSIC_MMIO_PAGE_SHIFT) - 1);
+ global->base_addr &= ~((BIT(global->group_index_bits) - 1) <<
+ global->group_index_shift);
+
+ /* Find number of MMIO register sets */
+ while (!imsic_get_mmio_resource(fwnode, *nr_mmios, &res))
+ (*nr_mmios)++;
+
+ return 0;
+}
+
+int __init imsic_setup_state(struct fwnode_handle *fwnode)
+{
+ int rc, cpu;
+ phys_addr_t base_addr;
+ void __iomem **mmios_va = NULL;
+ struct resource *mmios = NULL;
+ struct imsic_local_config *local;
+ struct imsic_global_config *global;
+ unsigned long reloff, hartid;
+ u32 i, j, index, nr_parent_irqs, nr_mmios, nr_handlers = 0;
+
+ /*
+ * Only one IMSIC instance allowed in a platform for clean
+ * implementation of SMP IRQ affinity and per-CPU IPIs.
+ *
+ * This means on a multi-socket (or multi-die) platform we
+ * will have multiple MMIO regions for one IMSIC instance.
+ */
+ if (imsic) {
+ pr_err("%pfwP: already initialized hence ignoring\n",
+ fwnode);
+ return -EALREADY;
+ }
+
+ if (!riscv_isa_extension_available(NULL, SxAIA)) {
+ pr_err("%pfwP: AIA support not available\n", fwnode);
+ return -ENODEV;
+ }
+
+ imsic = kzalloc(sizeof(*imsic), GFP_KERNEL);
+ if (!imsic)
+ return -ENOMEM;
+ imsic->fwnode = fwnode;
+ global = &imsic->global;
+
+ global->local = alloc_percpu(typeof(*(global->local)));
+ if (!global->local) {
+ rc = -ENOMEM;
+ goto out_free_priv;
+ }
+
+ /* Parse IMSIC fwnode */
+ rc = imsic_parse_fwnode(fwnode, global, &nr_parent_irqs, &nr_mmios);
+ if (rc)
+ goto out_free_local;
+
+ /* Allocate MMIO resource array */
+ mmios = kcalloc(nr_mmios, sizeof(*mmios), GFP_KERNEL);
+ if (!mmios) {
+ rc = -ENOMEM;
+ goto out_free_local;
+ }
+
+ /* Allocate MMIO virtual address array */
+ mmios_va = kcalloc(nr_mmios, sizeof(*mmios_va), GFP_KERNEL);
+ if (!mmios_va) {
+ rc = -ENOMEM;
+ goto out_iounmap;
+ }
+
+ /* Parse and map MMIO register sets */
+ for (i = 0; i < nr_mmios; i++) {
+ rc = imsic_get_mmio_resource(fwnode, i, &mmios[i]);
+ if (rc) {
+ pr_err("%pfwP: unable to parse MMIO regset %d\n",
+ fwnode, i);
+ goto out_iounmap;
+ }
+
+ base_addr = mmios[i].start;
+ base_addr &= ~(BIT(global->guest_index_bits +
+ global->hart_index_bits +
+ IMSIC_MMIO_PAGE_SHIFT) - 1);
+ base_addr &= ~((BIT(global->group_index_bits) - 1) <<
+ global->group_index_shift);
+ if (base_addr != global->base_addr) {
+ rc = -EINVAL;
+ pr_err("%pfwP: address mismatch for regset %d\n",
+ fwnode, i);
+ goto out_iounmap;
+ }
+
+ mmios_va[i] = ioremap(mmios[i].start, resource_size(&mmios[i]));
+ if (!mmios_va[i]) {
+ rc = -EIO;
+ pr_err("%pfwP: unable to map MMIO regset %d\n",
+ fwnode, i);
+ goto out_iounmap;
+ }
+ }
+
+ /* Initialize HW interrupt numbers */
+ rc = imsic_hwirqs_init();
+ if (rc) {
+ pr_err("%pfwP: failed to initialize HW interrupts numbers\n",
+ fwnode);
+ goto out_iounmap;
+ }
+
+ /* Initialize local (or per-CPU )state */
+ rc = imsic_local_init();
+ if (rc) {
+ pr_err("%pfwP: failed to initialize local state\n",
+ fwnode);
+ goto out_hwirqs_cleanup;
+ }
+
+ /* Configure handlers for target CPUs */
+ for (i = 0; i < nr_parent_irqs; i++) {
+ rc = imsic_get_parent_hartid(fwnode, i, &hartid);
+ if (rc) {
+ pr_warn("%pfwP: hart ID for parent irq%d not found\n",
+ fwnode, i);
+ continue;
+ }
+
+ cpu = riscv_hartid_to_cpuid(hartid);
+ if (cpu < 0) {
+ pr_warn("%pfwP: invalid cpuid for parent irq%d\n",
+ fwnode, i);
+ continue;
+ }
+
+ /* Find MMIO location of MSI page */
+ index = nr_mmios;
+ reloff = i * BIT(global->guest_index_bits) *
+ IMSIC_MMIO_PAGE_SZ;
+ for (j = 0; nr_mmios; j++) {
+ if (reloff < resource_size(&mmios[j])) {
+ index = j;
+ break;
+ }
+
+ /*
+ * MMIO region size may not be aligned to
+ * BIT(global->guest_index_bits) * IMSIC_MMIO_PAGE_SZ
+ * if holes are present.
+ */
+ reloff -= ALIGN(resource_size(&mmios[j]),
+ BIT(global->guest_index_bits) * IMSIC_MMIO_PAGE_SZ);
+ }
+ if (index >= nr_mmios) {
+ pr_warn("%pfwP: MMIO not found for parent irq%d\n",
+ fwnode, i);
+ continue;
+ }
+
+ local = per_cpu_ptr(global->local, cpu);
+ local->msi_pa = mmios[index].start + reloff;
+ local->msi_va = mmios_va[index] + reloff;
+
+ nr_handlers++;
+ }
+
+ /* If no CPU handlers found then can't take interrupts */
+ if (!nr_handlers) {
+ pr_err("%pfwP: No CPU handlers found\n", fwnode);
+ rc = -ENODEV;
+ goto out_local_cleanup;
+ }
+
+ /* Initialize matrix allocator */
+ rc = imsic_matrix_init();
+ if (rc) {
+ pr_err("%pfwP: failed to create matrix allocator\n",
+ fwnode);
+ goto out_local_cleanup;
+ }
+
+ /* We don't need MMIO arrays anymore so let's free-up */
+ kfree(mmios_va);
+ kfree(mmios);
+
+ return 0;
+
+out_local_cleanup:
+ imsic_local_cleanup();
+out_hwirqs_cleanup:
+ imsic_hwirqs_cleanup();
+out_iounmap:
+ for (i = 0; i < nr_mmios; i++) {
+ if (mmios_va[i])
+ iounmap(mmios_va[i]);
+ }
+ kfree(mmios_va);
+ kfree(mmios);
+out_free_local:
+ free_percpu(imsic->global.local);
+out_free_priv:
+ kfree(imsic);
+ imsic = NULL;
+ return rc;
+}
diff --git a/drivers/irqchip/irq-riscv-imsic-state.h b/drivers/irqchip/irq-riscv-imsic-state.h
new file mode 100644
index 000000000000..de83b649221c
--- /dev/null
+++ b/drivers/irqchip/irq-riscv-imsic-state.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2021 Western Digital Corporation or its affiliates.
+ * Copyright (C) 2022 Ventana Micro Systems Inc.
+ */
+
+#ifndef _IRQ_RISCV_IMSIC_STATE_H
+#define _IRQ_RISCV_IMSIC_STATE_H
+
+#include <linux/irqchip/riscv-imsic.h>
+#include <linux/irqdomain.h>
+#include <linux/fwnode.h>
+
+/*
+ * The IMSIC driver uses 1 IPI for ID synchronization and
+ * arch/riscv/kernel/smp.c require 6 IPIs so we fix the
+ * total number of IPIs to 8.
+ */
+#define IMSIC_IPI_ID 1
+#define IMSIC_NR_IPI 8
+
+struct imsic_vector {
+ /* Fixed details of the vector */
+ unsigned int cpu;
+ unsigned int local_id;
+ /* Details saved by driver in the vector */
+ unsigned int hwirq;
+};
+
+struct imsic_local_priv {
+ /* Local state of interrupt identities */
+ raw_spinlock_t ids_lock;
+ unsigned long *ids_enabled_bitmap;
+ struct imsic_vector **ids_move;
+
+ /* Local vector table */
+ struct imsic_vector *vectors;
+};
+
+struct imsic_priv {
+ /* Device details */
+ struct fwnode_handle *fwnode;
+
+ /* Global configuration common for all HARTs */
+ struct imsic_global_config global;
+
+ /* Dummy HW interrupt numbers */
+ unsigned int nr_hwirqs;
+ raw_spinlock_t hwirqs_lock;
+ unsigned long *hwirqs_used_bitmap;
+
+ /* Per-CPU state */
+ struct imsic_local_priv __percpu *lpriv;
+
+ /* State of IRQ matrix allocator */
+ raw_spinlock_t matrix_lock;
+ struct irq_matrix *matrix;
+
+ /* IPI interrupt identity and synchronization */
+ int ipi_virq;
+ struct irq_desc *ipi_lsync_desc;
+
+ /* IRQ domains (created by platform driver) */
+ struct irq_domain *base_domain;
+ struct irq_domain *plat_domain;
+};
+
+extern struct imsic_priv *imsic;
+
+void __imsic_eix_update(unsigned long base_id,
+ unsigned long num_id, bool pend, bool val);
+
+#define __imsic_id_set_enable(__id) \
+ __imsic_eix_update((__id), 1, false, true)
+#define __imsic_id_clear_enable(__id) \
+ __imsic_eix_update((__id), 1, false, false)
+
+void imsic_local_sync(void);
+void imsic_local_delivery(bool enable);
+
+void imsic_vector_mask(struct imsic_vector *vec);
+void imsic_vector_unmask(struct imsic_vector *vec);
+void imsic_vector_move(struct imsic_vector *old_vec,
+ struct imsic_vector *new_vec);
+
+struct imsic_vector *imsic_vector_from_local_id(unsigned int cpu,
+ unsigned int local_id);
+
+struct imsic_vector *imsic_vector_alloc(unsigned int hwirq,
+ const struct cpumask *mask);
+void imsic_vector_free(struct imsic_vector *vector);
+
+void imsic_vector_debug_show(struct seq_file *m,
+ struct imsic_vector *vec, int ind);
+
+void imsic_vector_debug_show_summary(struct seq_file *m, int ind);
+
+int imsic_hwirq_alloc(void);
+void imsic_hwirq_free(unsigned int hwirq);
+
+void imsic_state_online(void);
+void imsic_state_offline(void);
+int imsic_setup_state(struct fwnode_handle *fwnode);
+
+#endif
diff --git a/include/linux/irqchip/riscv-imsic.h b/include/linux/irqchip/riscv-imsic.h
new file mode 100644
index 000000000000..cbb7bcd0e4dd
--- /dev/null
+++ b/include/linux/irqchip/riscv-imsic.h
@@ -0,0 +1,87 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2021 Western Digital Corporation or its affiliates.
+ * Copyright (C) 2022 Ventana Micro Systems Inc.
+ */
+#ifndef __LINUX_IRQCHIP_RISCV_IMSIC_H
+#define __LINUX_IRQCHIP_RISCV_IMSIC_H
+
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <asm/csr.h>
+
+#define IMSIC_MMIO_PAGE_SHIFT 12
+#define IMSIC_MMIO_PAGE_SZ BIT(IMSIC_MMIO_PAGE_SHIFT)
+#define IMSIC_MMIO_PAGE_LE 0x00
+#define IMSIC_MMIO_PAGE_BE 0x04
+
+#define IMSIC_MIN_ID 63
+#define IMSIC_MAX_ID 2048
+
+#define IMSIC_EIDELIVERY 0x70
+
+#define IMSIC_EITHRESHOLD 0x72
+
+#define IMSIC_EIP0 0x80
+#define IMSIC_EIP63 0xbf
+#define IMSIC_EIPx_BITS 32
+
+#define IMSIC_EIE0 0xc0
+#define IMSIC_EIE63 0xff
+#define IMSIC_EIEx_BITS 32
+
+#define IMSIC_FIRST IMSIC_EIDELIVERY
+#define IMSIC_LAST IMSIC_EIE63
+
+#define IMSIC_MMIO_SETIPNUM_LE 0x00
+#define IMSIC_MMIO_SETIPNUM_BE 0x04
+
+struct imsic_local_config {
+ phys_addr_t msi_pa;
+ void __iomem *msi_va;
+};
+
+struct imsic_global_config {
+ /*
+ * MSI Target Address Scheme
+ *
+ * XLEN-1 12 0
+ * | | |
+ * -------------------------------------------------------------
+ * |xxxxxx|Group Index|xxxxxxxxxxx|HART Index|Guest Index| 0 |
+ * -------------------------------------------------------------
+ */
+
+ /* Bits representing Guest index, HART index, and Group index */
+ u32 guest_index_bits;
+ u32 hart_index_bits;
+ u32 group_index_bits;
+ u32 group_index_shift;
+
+ /* Global base address matching all target MSI addresses */
+ phys_addr_t base_addr;
+
+ /* Number of interrupt identities */
+ u32 nr_ids;
+
+ /* Number of guest interrupt identities */
+ u32 nr_guest_ids;
+
+ /* Per-CPU IMSIC addresses */
+ struct imsic_local_config __percpu *local;
+};
+
+#ifdef CONFIG_RISCV_IMSIC
+
+extern const struct imsic_global_config *imsic_get_global_config(void);
+
+#else
+
+static inline const struct imsic_global_config *imsic_get_global_config(void)
+{
+ return NULL;
+}
+
+#endif
+
+#endif
--
2.34.1
The Linux platform MSI support allows per-device MSI domains so let
us add a platform irqchip driver for RISC-V IMSIC which provides a
base IRQ domain with MSI parent support for platform device domains.
This driver assumes that the IMSIC state is already initialized by
the IMSIC early driver.
Signed-off-by: Anup Patel <[email protected]>
---
drivers/irqchip/Makefile | 2 +-
drivers/irqchip/irq-riscv-imsic-platform.c | 371 +++++++++++++++++++++
drivers/irqchip/irq-riscv-imsic-state.h | 2 +-
3 files changed, 373 insertions(+), 2 deletions(-)
create mode 100644 drivers/irqchip/irq-riscv-imsic-platform.c
diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
index d714724387ce..abca445a3229 100644
--- a/drivers/irqchip/Makefile
+++ b/drivers/irqchip/Makefile
@@ -95,7 +95,7 @@ obj-$(CONFIG_QCOM_MPM) += irq-qcom-mpm.o
obj-$(CONFIG_CSKY_MPINTC) += irq-csky-mpintc.o
obj-$(CONFIG_CSKY_APB_INTC) += irq-csky-apb-intc.o
obj-$(CONFIG_RISCV_INTC) += irq-riscv-intc.o
-obj-$(CONFIG_RISCV_IMSIC) += irq-riscv-imsic-state.o irq-riscv-imsic-early.o
+obj-$(CONFIG_RISCV_IMSIC) += irq-riscv-imsic-state.o irq-riscv-imsic-early.o irq-riscv-imsic-platform.o
obj-$(CONFIG_SIFIVE_PLIC) += irq-sifive-plic.o
obj-$(CONFIG_IMX_IRQSTEER) += irq-imx-irqsteer.o
obj-$(CONFIG_IMX_INTMUX) += irq-imx-intmux.o
diff --git a/drivers/irqchip/irq-riscv-imsic-platform.c b/drivers/irqchip/irq-riscv-imsic-platform.c
new file mode 100644
index 000000000000..65791a6b0727
--- /dev/null
+++ b/drivers/irqchip/irq-riscv-imsic-platform.c
@@ -0,0 +1,371 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2021 Western Digital Corporation or its affiliates.
+ * Copyright (C) 2022 Ventana Micro Systems Inc.
+ */
+
+#define pr_fmt(fmt) "riscv-imsic: " fmt
+#include <linux/bitmap.h>
+#include <linux/cpu.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/irq.h>
+#include <linux/irqchip.h>
+#include <linux/irqdomain.h>
+#include <linux/module.h>
+#include <linux/msi.h>
+#include <linux/platform_device.h>
+#include <linux/spinlock.h>
+#include <linux/smp.h>
+
+#include "irq-riscv-imsic-state.h"
+
+static int imsic_cpu_page_phys(unsigned int cpu,
+ unsigned int guest_index,
+ phys_addr_t *out_msi_pa)
+{
+ struct imsic_global_config *global;
+ struct imsic_local_config *local;
+
+ global = &imsic->global;
+ local = per_cpu_ptr(global->local, cpu);
+
+ if (BIT(global->guest_index_bits) <= guest_index)
+ return -EINVAL;
+
+ if (out_msi_pa)
+ *out_msi_pa = local->msi_pa +
+ (guest_index * IMSIC_MMIO_PAGE_SZ);
+
+ return 0;
+}
+
+static void imsic_irq_mask(struct irq_data *d)
+{
+ imsic_vector_mask(irq_data_get_irq_chip_data(d));
+}
+
+static void imsic_irq_unmask(struct irq_data *d)
+{
+ imsic_vector_unmask(irq_data_get_irq_chip_data(d));
+}
+
+static int imsic_irq_retrigger(struct irq_data *d)
+{
+ struct imsic_vector *vec = irq_data_get_irq_chip_data(d);
+ struct imsic_local_config *local;
+
+ if (WARN_ON(vec == NULL))
+ return -ENOENT;
+
+ local = per_cpu_ptr(imsic->global.local, vec->cpu);
+ writel(vec->local_id, local->msi_va);
+ return 0;
+}
+
+static void imsic_irq_compose_vector_msg(struct imsic_vector *vec,
+ struct msi_msg *msg)
+{
+ phys_addr_t msi_addr;
+ int err;
+
+ if (WARN_ON(vec == NULL))
+ return;
+
+ err = imsic_cpu_page_phys(vec->cpu, 0, &msi_addr);
+ if (WARN_ON(err))
+ return;
+
+ msg->address_hi = upper_32_bits(msi_addr);
+ msg->address_lo = lower_32_bits(msi_addr);
+ msg->data = vec->local_id;
+}
+
+static void imsic_irq_compose_msg(struct irq_data *d, struct msi_msg *msg)
+{
+ imsic_irq_compose_vector_msg(irq_data_get_irq_chip_data(d), msg);
+}
+
+#ifdef CONFIG_SMP
+static void imsic_msi_update_msg(struct irq_data *d, struct imsic_vector *vec)
+{
+ struct msi_msg msg[2] = { [1] = { }, };
+
+ imsic_irq_compose_vector_msg(vec, msg);
+ irq_data_get_irq_chip(d)->irq_write_msi_msg(d, msg);
+}
+
+static int imsic_irq_set_affinity(struct irq_data *d,
+ const struct cpumask *mask_val,
+ bool force)
+{
+ struct imsic_vector *old_vec, *new_vec;
+ struct irq_data *pd = d->parent_data;
+
+ old_vec = irq_data_get_irq_chip_data(pd);
+ if (WARN_ON(old_vec == NULL))
+ return -ENOENT;
+
+ /* Get a new vector on the desired set of CPUs */
+ new_vec = imsic_vector_alloc(old_vec->hwirq, mask_val);
+ if (!new_vec)
+ return -ENOSPC;
+
+ /* If old vector belongs to the desired CPU then do nothing */
+ if (old_vec->cpu == new_vec->cpu) {
+ imsic_vector_free(new_vec);
+ return IRQ_SET_MASK_OK_DONE;
+ }
+
+ /* Point device to the new vector */
+ imsic_msi_update_msg(d, new_vec);
+
+ /* Update irq descriptors with the new vector */
+ pd->chip_data = new_vec;
+
+ /* Update effective affinity of parent irq data */
+ irq_data_update_effective_affinity(pd, cpumask_of(new_vec->cpu));
+
+ /* Move state of the old vector to the new vector */
+ imsic_vector_move(old_vec, new_vec);
+
+ return IRQ_SET_MASK_OK_DONE;
+}
+#endif
+
+static struct irq_chip imsic_irq_base_chip = {
+ .name = "IMSIC",
+ .irq_mask = imsic_irq_mask,
+ .irq_unmask = imsic_irq_unmask,
+ .irq_retrigger = imsic_irq_retrigger,
+ .irq_compose_msi_msg = imsic_irq_compose_msg,
+ .flags = IRQCHIP_SKIP_SET_WAKE |
+ IRQCHIP_MASK_ON_SUSPEND,
+};
+
+static int imsic_irq_domain_alloc(struct irq_domain *domain,
+ unsigned int virq, unsigned int nr_irqs,
+ void *args)
+{
+ struct imsic_vector *vec;
+ int hwirq;
+
+ /* Legacy-MSI or multi-MSI not supported yet. */
+ if (nr_irqs > 1)
+ return -ENOTSUPP;
+
+ hwirq = imsic_hwirq_alloc();
+ if (hwirq < 0)
+ return hwirq;
+
+ vec = imsic_vector_alloc(hwirq, cpu_online_mask);
+ if (!vec) {
+ imsic_hwirq_free(hwirq);
+ return -ENOSPC;
+ }
+
+ irq_domain_set_info(domain, virq, hwirq,
+ &imsic_irq_base_chip, vec,
+ handle_simple_irq, NULL, NULL);
+ irq_set_noprobe(virq);
+ irq_set_affinity(virq, cpu_online_mask);
+
+ /*
+ * IMSIC does not implement irq_disable() so Linux interrupt
+ * subsystem will take a lazy approach for disabling an IMSIC
+ * interrupt. This means IMSIC interrupts are left unmasked
+ * upon system suspend and interrupts are not processed
+ * immediately upon system wake up. To tackle this, we disable
+ * the lazy approach for all IMSIC interrupts.
+ */
+ irq_set_status_flags(virq, IRQ_DISABLE_UNLAZY);
+
+ return 0;
+}
+
+static void imsic_irq_domain_free(struct irq_domain *domain,
+ unsigned int virq,
+ unsigned int nr_irqs)
+{
+ struct irq_data *d = irq_domain_get_irq_data(domain, virq);
+
+ imsic_vector_free(irq_data_get_irq_chip_data(d));
+ imsic_hwirq_free(d->hwirq);
+ irq_domain_free_irqs_parent(domain, virq, nr_irqs);
+}
+
+static int imsic_irq_domain_select(struct irq_domain *domain,
+ struct irq_fwspec *fwspec,
+ enum irq_domain_bus_token bus_token)
+{
+ const struct msi_parent_ops *ops = domain->msi_parent_ops;
+ u32 busmask = BIT(bus_token);
+
+ if (fwspec->fwnode != domain->fwnode || fwspec->param_count != 0)
+ return 0;
+
+ /* Handle pure domain searches */
+ if (bus_token == ops->bus_select_token)
+ return 1;
+
+ return !!(ops->bus_select_mask & busmask);
+}
+
+#ifdef CONFIG_GENERIC_IRQ_DEBUGFS
+static void imsic_irq_debug_show(struct seq_file *m, struct irq_domain *d,
+ struct irq_data *irqd, int ind)
+{
+ if (!irqd) {
+ imsic_vector_debug_show_summary(m, ind);
+ return;
+ }
+
+ imsic_vector_debug_show(m, irq_data_get_irq_chip_data(irqd), ind);
+}
+#endif
+
+static const struct irq_domain_ops imsic_base_domain_ops = {
+ .alloc = imsic_irq_domain_alloc,
+ .free = imsic_irq_domain_free,
+ .select = imsic_irq_domain_select,
+#ifdef CONFIG_GENERIC_IRQ_DEBUGFS
+ .debug_show = imsic_irq_debug_show,
+#endif
+};
+
+static bool imsic_init_dev_msi_info(struct device *dev,
+ struct irq_domain *domain,
+ struct irq_domain *real_parent,
+ struct msi_domain_info *info)
+{
+ const struct msi_parent_ops *pops = real_parent->msi_parent_ops;
+
+ /* MSI parent domain specific settings */
+ switch (real_parent->bus_token) {
+ case DOMAIN_BUS_NEXUS:
+ if (WARN_ON_ONCE(domain != real_parent))
+ return false;
+#ifdef CONFIG_SMP
+ info->chip->irq_set_affinity = imsic_irq_set_affinity;
+#endif
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ return false;
+ }
+
+ /* Is the target supported? */
+ switch (info->bus_token) {
+ case DOMAIN_BUS_DEVICE_IMS:
+ /*
+ * Per device IMS should never have any MSI feature bits
+ * set. It's sole purpose is to create a dumb interrupt
+ * chip which has a device specific irq_write_msi_msg()
+ * callback.
+ */
+ if (WARN_ON_ONCE(info->flags))
+ return false;
+
+ /* Core managed MSI descriptors */
+ info->flags |= MSI_FLAG_ALLOC_SIMPLE_MSI_DESCS |
+ MSI_FLAG_FREE_MSI_DESCS;
+ break;
+ case DOMAIN_BUS_WIRED_TO_MSI:
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ return false;
+ }
+
+ /* Use hierarchial chip operations re-trigger */
+ info->chip->irq_retrigger = irq_chip_retrigger_hierarchy;
+
+ /*
+ * Mask out the domain specific MSI feature flags which are not
+ * supported by the real parent.
+ */
+ info->flags &= pops->supported_flags;
+
+ /* Enforce the required flags */
+ info->flags |= pops->required_flags;
+
+ return true;
+}
+
+#define MATCH_PLATFORM_MSI BIT(DOMAIN_BUS_PLATFORM_MSI)
+
+static const struct msi_parent_ops imsic_msi_parent_ops = {
+ .supported_flags = MSI_GENERIC_FLAGS_MASK,
+ .required_flags = MSI_FLAG_USE_DEF_DOM_OPS |
+ MSI_FLAG_USE_DEF_CHIP_OPS,
+ .bus_select_token = DOMAIN_BUS_NEXUS,
+ .bus_select_mask = MATCH_PLATFORM_MSI,
+ .init_dev_msi_info = imsic_init_dev_msi_info,
+};
+
+int imsic_irqdomain_init(void)
+{
+ struct imsic_global_config *global;
+
+ if (!imsic || !imsic->fwnode) {
+ pr_err("early driver not probed\n");
+ return -ENODEV;
+ }
+
+ if (imsic->base_domain) {
+ pr_err("%pfwP: irq domain already created\n", imsic->fwnode);
+ return -ENODEV;
+ }
+
+ global = &imsic->global;
+
+ /* Create Base IRQ domain */
+ imsic->base_domain = irq_domain_create_tree(imsic->fwnode,
+ &imsic_base_domain_ops, imsic);
+ if (!imsic->base_domain) {
+ pr_err("%pfwP: failed to create IMSIC base domain\n",
+ imsic->fwnode);
+ return -ENOMEM;
+ }
+ imsic->base_domain->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
+ imsic->base_domain->msi_parent_ops = &imsic_msi_parent_ops;
+
+ irq_domain_update_bus_token(imsic->base_domain, DOMAIN_BUS_NEXUS);
+
+ pr_info("%pfwP: hart-index-bits: %d, guest-index-bits: %d\n",
+ imsic->fwnode, global->hart_index_bits, global->guest_index_bits);
+ pr_info("%pfwP: group-index-bits: %d, group-index-shift: %d\n",
+ imsic->fwnode, global->group_index_bits, global->group_index_shift);
+ pr_info("%pfwP: per-CPU IDs %d at base PPN %pa\n",
+ imsic->fwnode, global->nr_ids, &global->base_addr);
+ pr_info("%pfwP: total %d interrupts available\n",
+ imsic->fwnode, imsic->nr_hwirqs);
+
+ return 0;
+}
+
+static int imsic_platform_probe(struct platform_device *pdev)
+{
+ struct device *dev = &pdev->dev;
+
+ if (imsic && imsic->fwnode != dev->fwnode) {
+ dev_err(dev, "fwnode mismatch\n");
+ return -ENODEV;
+ }
+
+ return imsic_irqdomain_init();
+}
+
+static const struct of_device_id imsic_platform_match[] = {
+ { .compatible = "riscv,imsics" },
+ {}
+};
+
+static struct platform_driver imsic_platform_driver = {
+ .driver = {
+ .name = "riscv-imsic",
+ .of_match_table = imsic_platform_match,
+ },
+ .probe = imsic_platform_probe,
+};
+builtin_platform_driver(imsic_platform_driver);
diff --git a/drivers/irqchip/irq-riscv-imsic-state.h b/drivers/irqchip/irq-riscv-imsic-state.h
index de83b649221c..c76cab08bf78 100644
--- a/drivers/irqchip/irq-riscv-imsic-state.h
+++ b/drivers/irqchip/irq-riscv-imsic-state.h
@@ -62,7 +62,6 @@ struct imsic_priv {
/* IRQ domains (created by platform driver) */
struct irq_domain *base_domain;
- struct irq_domain *plat_domain;
};
extern struct imsic_priv *imsic;
@@ -101,5 +100,6 @@ void imsic_hwirq_free(unsigned int hwirq);
void imsic_state_online(void);
void imsic_state_offline(void);
int imsic_setup_state(struct fwnode_handle *fwnode);
+int imsic_irqdomain_init(void);
#endif
--
2.34.1
The Linux PCI framework supports per-device MSI domains for PCI devices
so let us extend the IMSIC driver to allow per-device MSI domains for
PCI devices.
Signed-off-by: Anup Patel <[email protected]>
---
drivers/irqchip/Kconfig | 7 +++++
drivers/irqchip/irq-riscv-imsic-platform.c | 36 ++++++++++++++++++++--
2 files changed, 41 insertions(+), 2 deletions(-)
diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index 85f86e31c996..2fc0cb32341a 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -553,6 +553,13 @@ config RISCV_IMSIC
select GENERIC_IRQ_MATRIX_ALLOCATOR
select GENERIC_MSI_IRQ
+config RISCV_IMSIC_PCI
+ bool
+ depends on RISCV_IMSIC
+ depends on PCI
+ depends on PCI_MSI
+ default RISCV_IMSIC
+
config EXYNOS_IRQ_COMBINER
bool "Samsung Exynos IRQ combiner support" if COMPILE_TEST
depends on (ARCH_EXYNOS && ARM) || COMPILE_TEST
diff --git a/drivers/irqchip/irq-riscv-imsic-platform.c b/drivers/irqchip/irq-riscv-imsic-platform.c
index 65791a6b0727..d78c93e2cf2b 100644
--- a/drivers/irqchip/irq-riscv-imsic-platform.c
+++ b/drivers/irqchip/irq-riscv-imsic-platform.c
@@ -14,6 +14,7 @@
#include <linux/irqdomain.h>
#include <linux/module.h>
#include <linux/msi.h>
+#include <linux/pci.h>
#include <linux/platform_device.h>
#include <linux/spinlock.h>
#include <linux/smp.h>
@@ -233,6 +234,28 @@ static const struct irq_domain_ops imsic_base_domain_ops = {
#endif
};
+#ifdef CONFIG_RISCV_IMSIC_PCI
+
+static void imsic_pci_mask_irq(struct irq_data *d)
+{
+ pci_msi_mask_irq(d);
+ irq_chip_mask_parent(d);
+}
+
+static void imsic_pci_unmask_irq(struct irq_data *d)
+{
+ pci_msi_unmask_irq(d);
+ irq_chip_unmask_parent(d);
+}
+
+#define MATCH_PCI_MSI BIT(DOMAIN_BUS_PCI_MSI)
+
+#else
+
+#define MATCH_PCI_MSI 0
+
+#endif
+
static bool imsic_init_dev_msi_info(struct device *dev,
struct irq_domain *domain,
struct irq_domain *real_parent,
@@ -242,6 +265,7 @@ static bool imsic_init_dev_msi_info(struct device *dev,
/* MSI parent domain specific settings */
switch (real_parent->bus_token) {
+ case DOMAIN_BUS_PCI_MSI:
case DOMAIN_BUS_NEXUS:
if (WARN_ON_ONCE(domain != real_parent))
return false;
@@ -256,6 +280,13 @@ static bool imsic_init_dev_msi_info(struct device *dev,
/* Is the target supported? */
switch (info->bus_token) {
+#ifdef CONFIG_RISCV_IMSIC_PCI
+ case DOMAIN_BUS_PCI_DEVICE_MSI:
+ case DOMAIN_BUS_PCI_DEVICE_MSIX:
+ info->chip->irq_mask = imsic_pci_mask_irq;
+ info->chip->irq_unmask = imsic_pci_unmask_irq;
+ break;
+#endif
case DOMAIN_BUS_DEVICE_IMS:
/*
* Per device IMS should never have any MSI feature bits
@@ -295,11 +326,12 @@ static bool imsic_init_dev_msi_info(struct device *dev,
#define MATCH_PLATFORM_MSI BIT(DOMAIN_BUS_PLATFORM_MSI)
static const struct msi_parent_ops imsic_msi_parent_ops = {
- .supported_flags = MSI_GENERIC_FLAGS_MASK,
+ .supported_flags = MSI_GENERIC_FLAGS_MASK |
+ MSI_FLAG_PCI_MSIX,
.required_flags = MSI_FLAG_USE_DEF_DOM_OPS |
MSI_FLAG_USE_DEF_CHIP_OPS,
.bus_select_token = DOMAIN_BUS_NEXUS,
- .bus_select_mask = MATCH_PLATFORM_MSI,
+ .bus_select_mask = MATCH_PCI_MSI | MATCH_PLATFORM_MSI,
.init_dev_msi_info = imsic_init_dev_msi_info,
};
--
2.34.1
The RISC-V advanced interrupt architecture (AIA) specification defines
advanced platform-level interrupt controller (APLIC) which has two modes
of operation: 1) Direct mode and 2) MSI mode.
(For more details, refer https://github.com/riscv/riscv-aia)
In APLIC direct-mode, wired interrupts are forwared to CPUs (or HARTs)
as a local external interrupt.
We add a platform irqchip driver for the RISC-V APLIC direct-mode to
support RISC-V platforms having only wired interrupts.
Signed-off-by: Anup Patel <[email protected]>
---
drivers/irqchip/Kconfig | 5 +
drivers/irqchip/Makefile | 1 +
drivers/irqchip/irq-riscv-aplic-direct.c | 343 +++++++++++++++++++++++
drivers/irqchip/irq-riscv-aplic-main.c | 232 +++++++++++++++
drivers/irqchip/irq-riscv-aplic-main.h | 45 +++
include/linux/irqchip/riscv-aplic.h | 119 ++++++++
6 files changed, 745 insertions(+)
create mode 100644 drivers/irqchip/irq-riscv-aplic-direct.c
create mode 100644 drivers/irqchip/irq-riscv-aplic-main.c
create mode 100644 drivers/irqchip/irq-riscv-aplic-main.h
create mode 100644 include/linux/irqchip/riscv-aplic.h
diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index 2fc0cb32341a..dbc8811d3764 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -546,6 +546,11 @@ config SIFIVE_PLIC
select IRQ_DOMAIN_HIERARCHY
select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP
+config RISCV_APLIC
+ bool
+ depends on RISCV
+ select IRQ_DOMAIN_HIERARCHY
+
config RISCV_IMSIC
bool
depends on RISCV
diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
index abca445a3229..7f8289790ed8 100644
--- a/drivers/irqchip/Makefile
+++ b/drivers/irqchip/Makefile
@@ -95,6 +95,7 @@ obj-$(CONFIG_QCOM_MPM) += irq-qcom-mpm.o
obj-$(CONFIG_CSKY_MPINTC) += irq-csky-mpintc.o
obj-$(CONFIG_CSKY_APB_INTC) += irq-csky-apb-intc.o
obj-$(CONFIG_RISCV_INTC) += irq-riscv-intc.o
+obj-$(CONFIG_RISCV_APLIC) += irq-riscv-aplic-main.o irq-riscv-aplic-direct.o
obj-$(CONFIG_RISCV_IMSIC) += irq-riscv-imsic-state.o irq-riscv-imsic-early.o irq-riscv-imsic-platform.o
obj-$(CONFIG_SIFIVE_PLIC) += irq-sifive-plic.o
obj-$(CONFIG_IMX_IRQSTEER) += irq-imx-irqsteer.o
diff --git a/drivers/irqchip/irq-riscv-aplic-direct.c b/drivers/irqchip/irq-riscv-aplic-direct.c
new file mode 100644
index 000000000000..9ed2666bfb5e
--- /dev/null
+++ b/drivers/irqchip/irq-riscv-aplic-direct.c
@@ -0,0 +1,343 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2021 Western Digital Corporation or its affiliates.
+ * Copyright (C) 2022 Ventana Micro Systems Inc.
+ */
+
+#include <linux/bitops.h>
+#include <linux/cpu.h>
+#include <linux/interrupt.h>
+#include <linux/irqchip.h>
+#include <linux/irqchip/chained_irq.h>
+#include <linux/irqchip/riscv-aplic.h>
+#include <linux/module.h>
+#include <linux/of_address.h>
+#include <linux/printk.h>
+#include <linux/smp.h>
+
+#include "irq-riscv-aplic-main.h"
+
+#define APLIC_DISABLE_IDELIVERY 0
+#define APLIC_ENABLE_IDELIVERY 1
+#define APLIC_DISABLE_ITHRESHOLD 1
+#define APLIC_ENABLE_ITHRESHOLD 0
+
+struct aplic_direct {
+ struct aplic_priv priv;
+ struct irq_domain *irqdomain;
+ struct cpumask lmask;
+};
+
+struct aplic_idc {
+ unsigned int hart_index;
+ void __iomem *regs;
+ struct aplic_direct *direct;
+};
+
+static unsigned int aplic_direct_parent_irq;
+static DEFINE_PER_CPU(struct aplic_idc, aplic_idcs);
+
+static void aplic_direct_irq_eoi(struct irq_data *d)
+{
+ /*
+ * The fasteoi_handler requires irq_eoi() callback hence
+ * provide a dummy handler.
+ */
+}
+
+#ifdef CONFIG_SMP
+static int aplic_direct_set_affinity(struct irq_data *d,
+ const struct cpumask *mask_val, bool force)
+{
+ struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
+ struct aplic_direct *direct =
+ container_of(priv, struct aplic_direct, priv);
+ struct aplic_idc *idc;
+ unsigned int cpu, val;
+ struct cpumask amask;
+ void __iomem *target;
+
+ cpumask_and(&amask, &direct->lmask, mask_val);
+
+ if (force)
+ cpu = cpumask_first(&amask);
+ else
+ cpu = cpumask_any_and(&amask, cpu_online_mask);
+
+ if (cpu >= nr_cpu_ids)
+ return -EINVAL;
+
+ idc = per_cpu_ptr(&aplic_idcs, cpu);
+ target = priv->regs + APLIC_TARGET_BASE;
+ target += (d->hwirq - 1) * sizeof(u32);
+ val = idc->hart_index & APLIC_TARGET_HART_IDX_MASK;
+ val <<= APLIC_TARGET_HART_IDX_SHIFT;
+ val |= APLIC_DEFAULT_PRIORITY;
+ writel(val, target);
+
+ irq_data_update_effective_affinity(d, cpumask_of(cpu));
+
+ return IRQ_SET_MASK_OK_DONE;
+}
+#endif
+
+static struct irq_chip aplic_direct_chip = {
+ .name = "APLIC-DIRECT",
+ .irq_mask = aplic_irq_mask,
+ .irq_unmask = aplic_irq_unmask,
+ .irq_set_type = aplic_irq_set_type,
+ .irq_eoi = aplic_direct_irq_eoi,
+#ifdef CONFIG_SMP
+ .irq_set_affinity = aplic_direct_set_affinity,
+#endif
+ .flags = IRQCHIP_SET_TYPE_MASKED |
+ IRQCHIP_SKIP_SET_WAKE |
+ IRQCHIP_MASK_ON_SUSPEND,
+};
+
+static int aplic_direct_irqdomain_translate(struct irq_domain *d,
+ struct irq_fwspec *fwspec,
+ unsigned long *hwirq,
+ unsigned int *type)
+{
+ struct aplic_priv *priv = d->host_data;
+
+ return aplic_irqdomain_translate(fwspec, priv->gsi_base,
+ hwirq, type);
+}
+
+static int aplic_direct_irqdomain_alloc(struct irq_domain *domain,
+ unsigned int virq, unsigned int nr_irqs,
+ void *arg)
+{
+ int i, ret;
+ unsigned int type;
+ irq_hw_number_t hwirq;
+ struct irq_fwspec *fwspec = arg;
+ struct aplic_priv *priv = domain->host_data;
+ struct aplic_direct *direct =
+ container_of(priv, struct aplic_direct, priv);
+
+ ret = aplic_irqdomain_translate(fwspec, priv->gsi_base,
+ &hwirq, &type);
+ if (ret)
+ return ret;
+
+ for (i = 0; i < nr_irqs; i++) {
+ irq_domain_set_info(domain, virq + i, hwirq + i,
+ &aplic_direct_chip, priv,
+ handle_fasteoi_irq, NULL, NULL);
+ irq_set_affinity(virq + i, &direct->lmask);
+ /* See the reason described in aplic_msi_irqdomain_alloc() */
+ irq_set_status_flags(virq + i, IRQ_DISABLE_UNLAZY);
+ }
+
+ return 0;
+}
+
+static const struct irq_domain_ops aplic_direct_irqdomain_ops = {
+ .translate = aplic_direct_irqdomain_translate,
+ .alloc = aplic_direct_irqdomain_alloc,
+ .free = irq_domain_free_irqs_top,
+};
+
+/*
+ * To handle an APLIC direct interrupts, we just read the CLAIMI register
+ * which will return highest priority pending interrupt and clear the
+ * pending bit of the interrupt. This process is repeated until CLAIMI
+ * register return zero value.
+ */
+static void aplic_direct_handle_irq(struct irq_desc *desc)
+{
+ struct aplic_idc *idc = this_cpu_ptr(&aplic_idcs);
+ struct irq_chip *chip = irq_desc_get_chip(desc);
+ struct irq_domain *irqdomain = idc->direct->irqdomain;
+ irq_hw_number_t hw_irq;
+ int irq;
+
+ chained_irq_enter(chip, desc);
+
+ while ((hw_irq = readl(idc->regs + APLIC_IDC_CLAIMI))) {
+ hw_irq = hw_irq >> APLIC_IDC_TOPI_ID_SHIFT;
+ irq = irq_find_mapping(irqdomain, hw_irq);
+
+ if (unlikely(irq <= 0))
+ dev_warn_ratelimited(idc->direct->priv.dev,
+ "hw_irq %lu mapping not found\n",
+ hw_irq);
+ else
+ generic_handle_irq(irq);
+ }
+
+ chained_irq_exit(chip, desc);
+}
+
+static void aplic_idc_set_delivery(struct aplic_idc *idc, bool en)
+{
+ u32 de = (en) ? APLIC_ENABLE_IDELIVERY : APLIC_DISABLE_IDELIVERY;
+ u32 th = (en) ? APLIC_ENABLE_ITHRESHOLD : APLIC_DISABLE_ITHRESHOLD;
+
+ /* Priority must be less than threshold for interrupt triggering */
+ writel(th, idc->regs + APLIC_IDC_ITHRESHOLD);
+
+ /* Delivery must be set to 1 for interrupt triggering */
+ writel(de, idc->regs + APLIC_IDC_IDELIVERY);
+}
+
+static int aplic_direct_dying_cpu(unsigned int cpu)
+{
+ if (aplic_direct_parent_irq)
+ disable_percpu_irq(aplic_direct_parent_irq);
+
+ return 0;
+}
+
+static int aplic_direct_starting_cpu(unsigned int cpu)
+{
+ if (aplic_direct_parent_irq)
+ enable_percpu_irq(aplic_direct_parent_irq,
+ irq_get_trigger_type(aplic_direct_parent_irq));
+
+ return 0;
+}
+
+static int aplic_direct_parse_parent_hwirq(struct device *dev,
+ u32 index, u32 *parent_hwirq,
+ unsigned long *parent_hartid)
+{
+ struct of_phandle_args parent;
+ int rc;
+
+ /*
+ * Currently, only OF fwnode is supported so extend this
+ * function for ACPI support.
+ */
+ if (!is_of_node(dev->fwnode))
+ return -EINVAL;
+
+ rc = of_irq_parse_one(to_of_node(dev->fwnode), index, &parent);
+ if (rc)
+ return rc;
+
+ rc = riscv_of_parent_hartid(parent.np, parent_hartid);
+ if (rc)
+ return rc;
+
+ *parent_hwirq = parent.args[0];
+ return 0;
+}
+
+int aplic_direct_setup(struct device *dev, void __iomem *regs)
+{
+ int i, j, rc, cpu, setup_count = 0;
+ struct aplic_direct *direct;
+ struct aplic_priv *priv;
+ struct irq_domain *domain;
+ unsigned long hartid;
+ struct aplic_idc *idc;
+ u32 val, hwirq;
+
+ direct = kzalloc(sizeof(*direct), GFP_KERNEL);
+ if (!direct)
+ return -ENOMEM;
+ priv = &direct->priv;
+
+ rc = aplic_setup_priv(priv, dev, regs);
+ if (rc) {
+ dev_err(dev, "failed to create APLIC context\n");
+ kfree(direct);
+ return rc;
+ }
+
+ /* Setup per-CPU IDC and target CPU mask */
+ for (i = 0; i < priv->nr_idcs; i++) {
+ rc = aplic_direct_parse_parent_hwirq(dev, i, &hwirq, &hartid);
+ if (rc) {
+ dev_warn(dev, "parent irq for IDC%d not found\n", i);
+ continue;
+ }
+
+ /*
+ * Skip interrupts other than external interrupts for
+ * current privilege level.
+ */
+ if (hwirq != RV_IRQ_EXT)
+ continue;
+
+ cpu = riscv_hartid_to_cpuid(hartid);
+ if (cpu < 0) {
+ dev_warn(dev, "invalid cpuid for IDC%d\n", i);
+ continue;
+ }
+
+ cpumask_set_cpu(cpu, &direct->lmask);
+
+ idc = per_cpu_ptr(&aplic_idcs, cpu);
+ idc->hart_index = i;
+ idc->regs = priv->regs + APLIC_IDC_BASE + i * APLIC_IDC_SIZE;
+ idc->direct = direct;
+
+ aplic_idc_set_delivery(idc, true);
+
+ /*
+ * Boot cpu might not have APLIC hart_index = 0 so check
+ * and update target registers of all interrupts.
+ */
+ if (cpu == smp_processor_id() && idc->hart_index) {
+ val = idc->hart_index & APLIC_TARGET_HART_IDX_MASK;
+ val <<= APLIC_TARGET_HART_IDX_SHIFT;
+ val |= APLIC_DEFAULT_PRIORITY;
+ for (j = 1; j <= priv->nr_irqs; j++)
+ writel(val, priv->regs + APLIC_TARGET_BASE +
+ (j - 1) * sizeof(u32));
+ }
+
+ setup_count++;
+ }
+
+ /* Find parent domain and register chained handler */
+ domain = irq_find_matching_fwnode(riscv_get_intc_hwnode(),
+ DOMAIN_BUS_ANY);
+ if (!aplic_direct_parent_irq && domain) {
+ aplic_direct_parent_irq = irq_create_mapping(domain, RV_IRQ_EXT);
+ if (aplic_direct_parent_irq) {
+ irq_set_chained_handler(aplic_direct_parent_irq,
+ aplic_direct_handle_irq);
+
+ /*
+ * Setup CPUHP notifier to enable parent
+ * interrupt on all CPUs
+ */
+ cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
+ "irqchip/riscv/aplic:starting",
+ aplic_direct_starting_cpu,
+ aplic_direct_dying_cpu);
+ }
+ }
+
+ /* Fail if we were not able to setup IDC for any CPU */
+ if (!setup_count) {
+ kfree(direct);
+ return -ENODEV;
+ }
+
+ /* Setup global config and interrupt delivery */
+ aplic_init_hw_global(priv, false);
+
+ /* Create irq domain instance for the APLIC */
+ direct->irqdomain = irq_domain_create_linear(dev->fwnode,
+ priv->nr_irqs + 1,
+ &aplic_direct_irqdomain_ops,
+ priv);
+ if (!direct->irqdomain) {
+ dev_err(dev, "failed to create direct irq domain\n");
+ kfree(direct);
+ return -ENOMEM;
+ }
+
+ /* Advertise the interrupt controller */
+ dev_info(dev, "%d interrupts directly connected to %d CPUs\n",
+ priv->nr_irqs, priv->nr_idcs);
+
+ return 0;
+}
diff --git a/drivers/irqchip/irq-riscv-aplic-main.c b/drivers/irqchip/irq-riscv-aplic-main.c
new file mode 100644
index 000000000000..87450708a733
--- /dev/null
+++ b/drivers/irqchip/irq-riscv-aplic-main.c
@@ -0,0 +1,232 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2021 Western Digital Corporation or its affiliates.
+ * Copyright (C) 2022 Ventana Micro Systems Inc.
+ */
+
+#include <linux/of.h>
+#include <linux/of_irq.h>
+#include <linux/printk.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/irqchip/riscv-aplic.h>
+
+#include "irq-riscv-aplic-main.h"
+
+void aplic_irq_unmask(struct irq_data *d)
+{
+ struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
+
+ writel(d->hwirq, priv->regs + APLIC_SETIENUM);
+}
+
+void aplic_irq_mask(struct irq_data *d)
+{
+ struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
+
+ writel(d->hwirq, priv->regs + APLIC_CLRIENUM);
+}
+
+int aplic_irq_set_type(struct irq_data *d, unsigned int type)
+{
+ u32 val = 0;
+ void __iomem *sourcecfg;
+ struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
+
+ switch (type) {
+ case IRQ_TYPE_NONE:
+ val = APLIC_SOURCECFG_SM_INACTIVE;
+ break;
+ case IRQ_TYPE_LEVEL_LOW:
+ val = APLIC_SOURCECFG_SM_LEVEL_LOW;
+ break;
+ case IRQ_TYPE_LEVEL_HIGH:
+ val = APLIC_SOURCECFG_SM_LEVEL_HIGH;
+ break;
+ case IRQ_TYPE_EDGE_FALLING:
+ val = APLIC_SOURCECFG_SM_EDGE_FALL;
+ break;
+ case IRQ_TYPE_EDGE_RISING:
+ val = APLIC_SOURCECFG_SM_EDGE_RISE;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ sourcecfg = priv->regs + APLIC_SOURCECFG_BASE;
+ sourcecfg += (d->hwirq - 1) * sizeof(u32);
+ writel(val, sourcecfg);
+
+ return 0;
+}
+
+int aplic_irqdomain_translate(struct irq_fwspec *fwspec, u32 gsi_base,
+ unsigned long *hwirq, unsigned int *type)
+{
+ if (WARN_ON(fwspec->param_count < 2))
+ return -EINVAL;
+ if (WARN_ON(!fwspec->param[0]))
+ return -EINVAL;
+
+ /* For DT, gsi_base is always zero. */
+ *hwirq = fwspec->param[0] - gsi_base;
+ *type = fwspec->param[1] & IRQ_TYPE_SENSE_MASK;
+
+ WARN_ON(*type == IRQ_TYPE_NONE);
+
+ return 0;
+}
+
+void aplic_init_hw_global(struct aplic_priv *priv, bool msi_mode)
+{
+ u32 val;
+#ifdef CONFIG_RISCV_M_MODE
+ u32 valH;
+
+ if (msi_mode) {
+ val = priv->msicfg.base_ppn;
+ valH = ((u64)priv->msicfg.base_ppn >> 32) &
+ APLIC_xMSICFGADDRH_BAPPN_MASK;
+ valH |= (priv->msicfg.lhxw & APLIC_xMSICFGADDRH_LHXW_MASK)
+ << APLIC_xMSICFGADDRH_LHXW_SHIFT;
+ valH |= (priv->msicfg.hhxw & APLIC_xMSICFGADDRH_HHXW_MASK)
+ << APLIC_xMSICFGADDRH_HHXW_SHIFT;
+ valH |= (priv->msicfg.lhxs & APLIC_xMSICFGADDRH_LHXS_MASK)
+ << APLIC_xMSICFGADDRH_LHXS_SHIFT;
+ valH |= (priv->msicfg.hhxs & APLIC_xMSICFGADDRH_HHXS_MASK)
+ << APLIC_xMSICFGADDRH_HHXS_SHIFT;
+ writel(val, priv->regs + APLIC_xMSICFGADDR);
+ writel(valH, priv->regs + APLIC_xMSICFGADDRH);
+ }
+#endif
+
+ /* Setup APLIC domaincfg register */
+ val = readl(priv->regs + APLIC_DOMAINCFG);
+ val |= APLIC_DOMAINCFG_IE;
+ if (msi_mode)
+ val |= APLIC_DOMAINCFG_DM;
+ writel(val, priv->regs + APLIC_DOMAINCFG);
+ if (readl(priv->regs + APLIC_DOMAINCFG) != val)
+ dev_warn(priv->dev, "unable to write 0x%x in domaincfg\n",
+ val);
+}
+
+static void aplic_init_hw_irqs(struct aplic_priv *priv)
+{
+ int i;
+
+ /* Disable all interrupts */
+ for (i = 0; i <= priv->nr_irqs; i += 32)
+ writel(-1U, priv->regs + APLIC_CLRIE_BASE +
+ (i / 32) * sizeof(u32));
+
+ /* Set interrupt type and default priority for all interrupts */
+ for (i = 1; i <= priv->nr_irqs; i++) {
+ writel(0, priv->regs + APLIC_SOURCECFG_BASE +
+ (i - 1) * sizeof(u32));
+ writel(APLIC_DEFAULT_PRIORITY,
+ priv->regs + APLIC_TARGET_BASE +
+ (i - 1) * sizeof(u32));
+ }
+
+ /* Clear APLIC domaincfg */
+ writel(0, priv->regs + APLIC_DOMAINCFG);
+}
+
+int aplic_setup_priv(struct aplic_priv *priv, struct device *dev,
+ void __iomem *regs)
+{
+ struct of_phandle_args parent;
+ int rc;
+
+ /*
+ * Currently, only OF fwnode is supported so extend this
+ * function for ACPI support.
+ */
+ if (!is_of_node(dev->fwnode))
+ return -EINVAL;
+
+ /* Save device pointer and register base */
+ priv->dev = dev;
+ priv->regs = regs;
+
+ /* Find out number of interrupt sources */
+ rc = of_property_read_u32(to_of_node(dev->fwnode),
+ "riscv,num-sources",
+ &priv->nr_irqs);
+ if (rc) {
+ dev_err(dev, "failed to get number of interrupt sources\n");
+ return rc;
+ }
+
+ /*
+ * Find out number of IDCs based on parent interrupts
+ *
+ * If "msi-parent" property is present then we ignore the
+ * APLIC IDCs which forces the APLIC driver to use MSI mode.
+ */
+ if (!of_property_present(to_of_node(dev->fwnode), "msi-parent")) {
+ while (!of_irq_parse_one(to_of_node(dev->fwnode),
+ priv->nr_idcs, &parent))
+ priv->nr_idcs++;
+ }
+
+ /* Setup initial state APLIC interrupts */
+ aplic_init_hw_irqs(priv);
+
+ return 0;
+}
+
+static int aplic_probe(struct platform_device *pdev)
+{
+ struct device *dev = &pdev->dev;
+ bool msi_mode = false;
+ struct resource *res;
+ void __iomem *regs;
+ int rc;
+
+ /* Map the MMIO registers */
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (!res) {
+ dev_err(dev, "failed to get MMIO resource\n");
+ return -EINVAL;
+ }
+ regs = devm_ioremap(&pdev->dev, res->start, resource_size(res));
+ if (!regs) {
+ dev_err(dev, "failed map MMIO registers\n");
+ return -ENOMEM;
+ }
+
+ /*
+ * If msi-parent property is present then setup APLIC MSI
+ * mode otherwise setup APLIC direct mode.
+ */
+ if (is_of_node(dev->fwnode))
+ msi_mode = of_property_present(to_of_node(dev->fwnode),
+ "msi-parent");
+ if (msi_mode)
+ rc = -ENODEV;
+ else
+ rc = aplic_direct_setup(dev, regs);
+ if (rc) {
+ dev_err(dev, "failed setup APLIC in %s mode\n",
+ msi_mode ? "MSI" : "direct");
+ return rc;
+ }
+
+ return 0;
+}
+
+static const struct of_device_id aplic_match[] = {
+ { .compatible = "riscv,aplic" },
+ {}
+};
+
+static struct platform_driver aplic_driver = {
+ .driver = {
+ .name = "riscv-aplic",
+ .of_match_table = aplic_match,
+ },
+ .probe = aplic_probe,
+};
+builtin_platform_driver(aplic_driver);
diff --git a/drivers/irqchip/irq-riscv-aplic-main.h b/drivers/irqchip/irq-riscv-aplic-main.h
new file mode 100644
index 000000000000..474a04229334
--- /dev/null
+++ b/drivers/irqchip/irq-riscv-aplic-main.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2021 Western Digital Corporation or its affiliates.
+ * Copyright (C) 2022 Ventana Micro Systems Inc.
+ */
+
+#ifndef _IRQ_RISCV_APLIC_MAIN_H
+#define _IRQ_RISCV_APLIC_MAIN_H
+
+#include <linux/device.h>
+#include <linux/io.h>
+#include <linux/irq.h>
+#include <linux/irqdomain.h>
+#include <linux/fwnode.h>
+
+#define APLIC_DEFAULT_PRIORITY 1
+
+struct aplic_msicfg {
+ phys_addr_t base_ppn;
+ u32 hhxs;
+ u32 hhxw;
+ u32 lhxs;
+ u32 lhxw;
+};
+
+struct aplic_priv {
+ struct device *dev;
+ u32 gsi_base;
+ u32 nr_irqs;
+ u32 nr_idcs;
+ void __iomem *regs;
+ struct aplic_msicfg msicfg;
+};
+
+void aplic_irq_unmask(struct irq_data *d);
+void aplic_irq_mask(struct irq_data *d);
+int aplic_irq_set_type(struct irq_data *d, unsigned int type);
+int aplic_irqdomain_translate(struct irq_fwspec *fwspec, u32 gsi_base,
+ unsigned long *hwirq, unsigned int *type);
+void aplic_init_hw_global(struct aplic_priv *priv, bool msi_mode);
+int aplic_setup_priv(struct aplic_priv *priv, struct device *dev,
+ void __iomem *regs);
+int aplic_direct_setup(struct device *dev, void __iomem *regs);
+
+#endif
diff --git a/include/linux/irqchip/riscv-aplic.h b/include/linux/irqchip/riscv-aplic.h
new file mode 100644
index 000000000000..97e198ea0109
--- /dev/null
+++ b/include/linux/irqchip/riscv-aplic.h
@@ -0,0 +1,119 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2021 Western Digital Corporation or its affiliates.
+ * Copyright (C) 2022 Ventana Micro Systems Inc.
+ */
+#ifndef __LINUX_IRQCHIP_RISCV_APLIC_H
+#define __LINUX_IRQCHIP_RISCV_APLIC_H
+
+#include <linux/bitops.h>
+
+#define APLIC_MAX_IDC BIT(14)
+#define APLIC_MAX_SOURCE 1024
+
+#define APLIC_DOMAINCFG 0x0000
+#define APLIC_DOMAINCFG_RDONLY 0x80000000
+#define APLIC_DOMAINCFG_IE BIT(8)
+#define APLIC_DOMAINCFG_DM BIT(2)
+#define APLIC_DOMAINCFG_BE BIT(0)
+
+#define APLIC_SOURCECFG_BASE 0x0004
+#define APLIC_SOURCECFG_D BIT(10)
+#define APLIC_SOURCECFG_CHILDIDX_MASK 0x000003ff
+#define APLIC_SOURCECFG_SM_MASK 0x00000007
+#define APLIC_SOURCECFG_SM_INACTIVE 0x0
+#define APLIC_SOURCECFG_SM_DETACH 0x1
+#define APLIC_SOURCECFG_SM_EDGE_RISE 0x4
+#define APLIC_SOURCECFG_SM_EDGE_FALL 0x5
+#define APLIC_SOURCECFG_SM_LEVEL_HIGH 0x6
+#define APLIC_SOURCECFG_SM_LEVEL_LOW 0x7
+
+#define APLIC_MMSICFGADDR 0x1bc0
+#define APLIC_MMSICFGADDRH 0x1bc4
+#define APLIC_SMSICFGADDR 0x1bc8
+#define APLIC_SMSICFGADDRH 0x1bcc
+
+#ifdef CONFIG_RISCV_M_MODE
+#define APLIC_xMSICFGADDR APLIC_MMSICFGADDR
+#define APLIC_xMSICFGADDRH APLIC_MMSICFGADDRH
+#else
+#define APLIC_xMSICFGADDR APLIC_SMSICFGADDR
+#define APLIC_xMSICFGADDRH APLIC_SMSICFGADDRH
+#endif
+
+#define APLIC_xMSICFGADDRH_L BIT(31)
+#define APLIC_xMSICFGADDRH_HHXS_MASK 0x1f
+#define APLIC_xMSICFGADDRH_HHXS_SHIFT 24
+#define APLIC_xMSICFGADDRH_LHXS_MASK 0x7
+#define APLIC_xMSICFGADDRH_LHXS_SHIFT 20
+#define APLIC_xMSICFGADDRH_HHXW_MASK 0x7
+#define APLIC_xMSICFGADDRH_HHXW_SHIFT 16
+#define APLIC_xMSICFGADDRH_LHXW_MASK 0xf
+#define APLIC_xMSICFGADDRH_LHXW_SHIFT 12
+#define APLIC_xMSICFGADDRH_BAPPN_MASK 0xfff
+
+#define APLIC_xMSICFGADDR_PPN_SHIFT 12
+
+#define APLIC_xMSICFGADDR_PPN_HART(__lhxs) \
+ (BIT(__lhxs) - 1)
+
+#define APLIC_xMSICFGADDR_PPN_LHX_MASK(__lhxw) \
+ (BIT(__lhxw) - 1)
+#define APLIC_xMSICFGADDR_PPN_LHX_SHIFT(__lhxs) \
+ ((__lhxs))
+#define APLIC_xMSICFGADDR_PPN_LHX(__lhxw, __lhxs) \
+ (APLIC_xMSICFGADDR_PPN_LHX_MASK(__lhxw) << \
+ APLIC_xMSICFGADDR_PPN_LHX_SHIFT(__lhxs))
+
+#define APLIC_xMSICFGADDR_PPN_HHX_MASK(__hhxw) \
+ (BIT(__hhxw) - 1)
+#define APLIC_xMSICFGADDR_PPN_HHX_SHIFT(__hhxs) \
+ ((__hhxs) + APLIC_xMSICFGADDR_PPN_SHIFT)
+#define APLIC_xMSICFGADDR_PPN_HHX(__hhxw, __hhxs) \
+ (APLIC_xMSICFGADDR_PPN_HHX_MASK(__hhxw) << \
+ APLIC_xMSICFGADDR_PPN_HHX_SHIFT(__hhxs))
+
+#define APLIC_IRQBITS_PER_REG 32
+
+#define APLIC_SETIP_BASE 0x1c00
+#define APLIC_SETIPNUM 0x1cdc
+
+#define APLIC_CLRIP_BASE 0x1d00
+#define APLIC_CLRIPNUM 0x1ddc
+
+#define APLIC_SETIE_BASE 0x1e00
+#define APLIC_SETIENUM 0x1edc
+
+#define APLIC_CLRIE_BASE 0x1f00
+#define APLIC_CLRIENUM 0x1fdc
+
+#define APLIC_SETIPNUM_LE 0x2000
+#define APLIC_SETIPNUM_BE 0x2004
+
+#define APLIC_GENMSI 0x3000
+
+#define APLIC_TARGET_BASE 0x3004
+#define APLIC_TARGET_HART_IDX_SHIFT 18
+#define APLIC_TARGET_HART_IDX_MASK 0x3fff
+#define APLIC_TARGET_GUEST_IDX_SHIFT 12
+#define APLIC_TARGET_GUEST_IDX_MASK 0x3f
+#define APLIC_TARGET_IPRIO_MASK 0xff
+#define APLIC_TARGET_EIID_MASK 0x7ff
+
+#define APLIC_IDC_BASE 0x4000
+#define APLIC_IDC_SIZE 32
+
+#define APLIC_IDC_IDELIVERY 0x00
+
+#define APLIC_IDC_IFORCE 0x04
+
+#define APLIC_IDC_ITHRESHOLD 0x08
+
+#define APLIC_IDC_TOPI 0x18
+#define APLIC_IDC_TOPI_ID_SHIFT 16
+#define APLIC_IDC_TOPI_ID_MASK 0x3ff
+#define APLIC_IDC_TOPI_PRIO_MASK 0xff
+
+#define APLIC_IDC_CLAIMI 0x1c
+
+#endif
--
2.34.1
From: Thomas Gleixner <[email protected]>
Some platform-MSI implementations require that power management is
redirected to the underlying interrupt chip device. To make this work
with per device MSI domains provide a new feature flag and let the
core code handle the setup of dev->pm_dev when set during device MSI
domain creation.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
---
include/linux/msi.h | 2 ++
kernel/irq/msi.c | 5 ++++-
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/include/linux/msi.h b/include/linux/msi.h
index ac73f678da7d..b21581ca8e9a 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -554,6 +554,8 @@ enum {
MSI_FLAG_FREE_MSI_DESCS = (1 << 6),
/* Use dev->fwnode for MSI device domain creation */
MSI_FLAG_USE_DEV_FWNODE = (1 << 7),
+ /* Set parent->dev into domain->pm_dev on device domain creation */
+ MSI_FLAG_PARENT_PM_DEV = (1 << 8),
/* Mask for the generic functionality */
MSI_GENERIC_FLAGS_MASK = GENMASK(15, 0),
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 07e9daaf0657..f90952ebc494 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -845,8 +845,11 @@ static struct irq_domain *__msi_create_irq_domain(struct fwnode_handle *fwnode,
domain = irq_domain_create_hierarchy(parent, flags | IRQ_DOMAIN_FLAG_MSI, 0,
fwnode, &msi_domain_ops, info);
- if (domain)
+ if (domain) {
irq_domain_update_bus_token(domain, info->bus_token);
+ if (info->flags & MSI_FLAG_PARENT_PM_DEV)
+ domain->pm_dev = parent->pm_dev;
+ }
return domain;
}
--
2.34.1
The QEMU virt machine supports AIA emulation and we also have
quite a few RISC-V platforms with AIA support under development
so let us select APLIC and IMSIC drivers for all RISC-V platforms.
Signed-off-by: Anup Patel <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
---
arch/riscv/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index bffbd869a068..569f2b6fd60a 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -162,6 +162,8 @@ config RISCV
select PCI_DOMAINS_GENERIC if PCI
select PCI_MSI if PCI
select RISCV_ALTERNATIVE if !XIP_KERNEL
+ select RISCV_APLIC
+ select RISCV_IMSIC
select RISCV_INTC
select RISCV_TIMER if RISCV_SBI
select SIFIVE_PLIC
--
2.34.1
Add myself as maintainer for RISC-V AIA drivers including the
RISC-V INTC driver which supports both AIA and non-AIA platforms.
Signed-off-by: Anup Patel <[email protected]>
---
MAINTAINERS | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 8d1052fa6a69..49dd12e90609 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -18792,6 +18792,20 @@ S: Maintained
F: drivers/mtd/nand/raw/r852.c
F: drivers/mtd/nand/raw/r852.h
+RISC-V AIA DRIVERS
+M: Anup Patel <[email protected]>
+L: [email protected]
+S: Maintained
+F: Documentation/devicetree/bindings/interrupt-controller/riscv,aplic.yaml
+F: Documentation/devicetree/bindings/interrupt-controller/riscv,imsics.yaml
+F: drivers/irqchip/irq-riscv-aplic-*.c
+F: drivers/irqchip/irq-riscv-aplic-*.h
+F: drivers/irqchip/irq-riscv-imsic-*.c
+F: drivers/irqchip/irq-riscv-imsic-*.h
+F: drivers/irqchip/irq-riscv-intc.c
+F: include/linux/irqchip/riscv-aplic.h
+F: include/linux/irqchip/riscv-imsic.h
+
RISC-V ARCHITECTURE
M: Paul Walmsley <[email protected]>
M: Palmer Dabbelt <[email protected]>
--
2.34.1
We add DT bindings document for the RISC-V incoming MSI controller
(IMSIC) defined by the RISC-V advanced interrupt architecture (AIA)
specification.
Signed-off-by: Anup Patel <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Acked-by: Krzysztof Kozlowski <[email protected]>
---
.../interrupt-controller/riscv,imsics.yaml | 172 ++++++++++++++++++
1 file changed, 172 insertions(+)
create mode 100644 Documentation/devicetree/bindings/interrupt-controller/riscv,imsics.yaml
diff --git a/Documentation/devicetree/bindings/interrupt-controller/riscv,imsics.yaml b/Documentation/devicetree/bindings/interrupt-controller/riscv,imsics.yaml
new file mode 100644
index 000000000000..84976f17a4a1
--- /dev/null
+++ b/Documentation/devicetree/bindings/interrupt-controller/riscv,imsics.yaml
@@ -0,0 +1,172 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/interrupt-controller/riscv,imsics.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: RISC-V Incoming MSI Controller (IMSIC)
+
+maintainers:
+ - Anup Patel <[email protected]>
+
+description: |
+ The RISC-V advanced interrupt architecture (AIA) defines a per-CPU incoming
+ MSI controller (IMSIC) for handling MSIs in a RISC-V platform. The RISC-V
+ AIA specification can be found at https://github.com/riscv/riscv-aia.
+
+ The IMSIC is a per-CPU (or per-HART) device with separate interrupt file
+ for each privilege level (machine or supervisor). The configuration of
+ a IMSIC interrupt file is done using AIA CSRs and it also has a 4KB MMIO
+ space to receive MSIs from devices. Each IMSIC interrupt file supports a
+ fixed number of interrupt identities (to distinguish MSIs from devices)
+ which is same for given privilege level across CPUs (or HARTs).
+
+ The device tree of a RISC-V platform will have one IMSIC device tree node
+ for each privilege level (machine or supervisor) which collectively describe
+ IMSIC interrupt files at that privilege level across CPUs (or HARTs).
+
+ The arrangement of IMSIC interrupt files in MMIO space of a RISC-V platform
+ follows a particular scheme defined by the RISC-V AIA specification. A IMSIC
+ group is a set of IMSIC interrupt files co-located in MMIO space and we can
+ have multiple IMSIC groups (i.e. clusters, sockets, chiplets, etc) in a
+ RISC-V platform. The MSI target address of a IMSIC interrupt file at given
+ privilege level (machine or supervisor) encodes group index, HART index,
+ and guest index (shown below).
+
+ XLEN-1 > (HART Index MSB) 12 0
+ | | | |
+ -------------------------------------------------------------
+ |xxxxxx|Group Index|xxxxxxxxxxx|HART Index|Guest Index| 0 |
+ -------------------------------------------------------------
+
+allOf:
+ - $ref: /schemas/interrupt-controller.yaml#
+ - $ref: /schemas/interrupt-controller/msi-controller.yaml#
+
+properties:
+ compatible:
+ items:
+ - enum:
+ - qemu,imsics
+ - const: riscv,imsics
+
+ reg:
+ minItems: 1
+ maxItems: 16384
+ description:
+ Base address of each IMSIC group.
+
+ interrupt-controller: true
+
+ "#interrupt-cells":
+ const: 0
+
+ msi-controller: true
+
+ "#msi-cells":
+ const: 0
+
+ interrupts-extended:
+ minItems: 1
+ maxItems: 16384
+ description:
+ This property represents the set of CPUs (or HARTs) for which given
+ device tree node describes the IMSIC interrupt files. Each node pointed
+ to should be a riscv,cpu-intc node, which has a CPU node (i.e. RISC-V
+ HART) as parent.
+
+ riscv,num-ids:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ minimum: 63
+ maximum: 2047
+ description:
+ Number of interrupt identities supported by IMSIC interrupt file.
+
+ riscv,num-guest-ids:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ minimum: 63
+ maximum: 2047
+ description:
+ Number of interrupt identities are supported by IMSIC guest interrupt
+ file. When not specified it is assumed to be same as specified by the
+ riscv,num-ids property.
+
+ riscv,guest-index-bits:
+ minimum: 0
+ maximum: 7
+ default: 0
+ description:
+ Number of guest index bits in the MSI target address.
+
+ riscv,hart-index-bits:
+ minimum: 0
+ maximum: 15
+ description:
+ Number of HART index bits in the MSI target address. When not
+ specified it is calculated based on the interrupts-extended property.
+
+ riscv,group-index-bits:
+ minimum: 0
+ maximum: 7
+ default: 0
+ description:
+ Number of group index bits in the MSI target address.
+
+ riscv,group-index-shift:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ minimum: 0
+ maximum: 55
+ default: 24
+ description:
+ The least significant bit position of the group index bits in the
+ MSI target address.
+
+required:
+ - compatible
+ - reg
+ - interrupt-controller
+ - msi-controller
+ - "#msi-cells"
+ - interrupts-extended
+ - riscv,num-ids
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ // Example 1 (Machine-level IMSIC files with just one group):
+
+ interrupt-controller@24000000 {
+ compatible = "qemu,imsics", "riscv,imsics";
+ interrupts-extended = <&cpu1_intc 11>,
+ <&cpu2_intc 11>,
+ <&cpu3_intc 11>,
+ <&cpu4_intc 11>;
+ reg = <0x28000000 0x4000>;
+ interrupt-controller;
+ #interrupt-cells = <0>;
+ msi-controller;
+ #msi-cells = <0>;
+ riscv,num-ids = <127>;
+ };
+
+ - |
+ // Example 2 (Supervisor-level IMSIC files with two groups):
+
+ interrupt-controller@28000000 {
+ compatible = "qemu,imsics", "riscv,imsics";
+ interrupts-extended = <&cpu1_intc 9>,
+ <&cpu2_intc 9>,
+ <&cpu3_intc 9>,
+ <&cpu4_intc 9>;
+ reg = <0x28000000 0x2000>, /* Group0 IMSICs */
+ <0x29000000 0x2000>; /* Group1 IMSICs */
+ interrupt-controller;
+ #interrupt-cells = <0>;
+ msi-controller;
+ #msi-cells = <0>;
+ riscv,num-ids = <127>;
+ riscv,group-index-bits = <1>;
+ riscv,group-index-shift = <24>;
+ };
+...
--
2.34.1
The RISC-V advanced platform-level interrupt controller (APLIC) has
two modes of operation: 1) Direct mode and 2) MSI mode.
(For more details, refer https://github.com/riscv/riscv-aia)
In APLIC MSI-mode, wired interrupts are forwared as message signaled
interrupts (MSIs) to CPUs via IMSIC.
We extend the existing APLIC irqchip driver to support MSI-mode for
RISC-V platforms having both wired interrupts and MSIs.
Signed-off-by: Anup Patel <[email protected]>
---
drivers/irqchip/Kconfig | 6 +
drivers/irqchip/Makefile | 1 +
drivers/irqchip/irq-riscv-aplic-main.c | 2 +-
drivers/irqchip/irq-riscv-aplic-main.h | 8 +
drivers/irqchip/irq-riscv-aplic-msi.c | 256 +++++++++++++++++++++++++
5 files changed, 272 insertions(+), 1 deletion(-)
create mode 100644 drivers/irqchip/irq-riscv-aplic-msi.c
diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index dbc8811d3764..806b5fccb3e8 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -551,6 +551,12 @@ config RISCV_APLIC
depends on RISCV
select IRQ_DOMAIN_HIERARCHY
+config RISCV_APLIC_MSI
+ bool
+ depends on RISCV_APLIC
+ select GENERIC_MSI_IRQ
+ default RISCV_APLIC
+
config RISCV_IMSIC
bool
depends on RISCV
diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
index 7f8289790ed8..47995fdb2c60 100644
--- a/drivers/irqchip/Makefile
+++ b/drivers/irqchip/Makefile
@@ -96,6 +96,7 @@ obj-$(CONFIG_CSKY_MPINTC) += irq-csky-mpintc.o
obj-$(CONFIG_CSKY_APB_INTC) += irq-csky-apb-intc.o
obj-$(CONFIG_RISCV_INTC) += irq-riscv-intc.o
obj-$(CONFIG_RISCV_APLIC) += irq-riscv-aplic-main.o irq-riscv-aplic-direct.o
+obj-$(CONFIG_RISCV_APLIC_MSI) += irq-riscv-aplic-msi.o
obj-$(CONFIG_RISCV_IMSIC) += irq-riscv-imsic-state.o irq-riscv-imsic-early.o irq-riscv-imsic-platform.o
obj-$(CONFIG_SIFIVE_PLIC) += irq-sifive-plic.o
obj-$(CONFIG_IMX_IRQSTEER) += irq-imx-irqsteer.o
diff --git a/drivers/irqchip/irq-riscv-aplic-main.c b/drivers/irqchip/irq-riscv-aplic-main.c
index 87450708a733..d1b342b66551 100644
--- a/drivers/irqchip/irq-riscv-aplic-main.c
+++ b/drivers/irqchip/irq-riscv-aplic-main.c
@@ -205,7 +205,7 @@ static int aplic_probe(struct platform_device *pdev)
msi_mode = of_property_present(to_of_node(dev->fwnode),
"msi-parent");
if (msi_mode)
- rc = -ENODEV;
+ rc = aplic_msi_setup(dev, regs);
else
rc = aplic_direct_setup(dev, regs);
if (rc) {
diff --git a/drivers/irqchip/irq-riscv-aplic-main.h b/drivers/irqchip/irq-riscv-aplic-main.h
index 474a04229334..78267ec58098 100644
--- a/drivers/irqchip/irq-riscv-aplic-main.h
+++ b/drivers/irqchip/irq-riscv-aplic-main.h
@@ -41,5 +41,13 @@ void aplic_init_hw_global(struct aplic_priv *priv, bool msi_mode);
int aplic_setup_priv(struct aplic_priv *priv, struct device *dev,
void __iomem *regs);
int aplic_direct_setup(struct device *dev, void __iomem *regs);
+#ifdef CONFIG_RISCV_APLIC_MSI
+int aplic_msi_setup(struct device *dev, void __iomem *regs);
+#else
+static inline int aplic_msi_setup(struct device *dev, void __iomem *regs)
+{
+ return -ENODEV;
+}
+#endif
#endif
diff --git a/drivers/irqchip/irq-riscv-aplic-msi.c b/drivers/irqchip/irq-riscv-aplic-msi.c
new file mode 100644
index 000000000000..8d7d1b3d1247
--- /dev/null
+++ b/drivers/irqchip/irq-riscv-aplic-msi.c
@@ -0,0 +1,256 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2021 Western Digital Corporation or its affiliates.
+ * Copyright (C) 2022 Ventana Micro Systems Inc.
+ */
+
+#include <linux/bitops.h>
+#include <linux/cpu.h>
+#include <linux/interrupt.h>
+#include <linux/irqchip.h>
+#include <linux/irqchip/riscv-aplic.h>
+#include <linux/irqchip/riscv-imsic.h>
+#include <linux/module.h>
+#include <linux/msi.h>
+#include <linux/of_irq.h>
+#include <linux/platform_device.h>
+#include <linux/printk.h>
+#include <linux/smp.h>
+
+#include "irq-riscv-aplic-main.h"
+
+static void aplic_msi_irq_unmask(struct irq_data *d)
+{
+ aplic_irq_unmask(d);
+ irq_chip_unmask_parent(d);
+}
+
+static void aplic_msi_irq_mask(struct irq_data *d)
+{
+ aplic_irq_mask(d);
+ irq_chip_mask_parent(d);
+}
+
+static void aplic_msi_irq_eoi(struct irq_data *d)
+{
+ struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
+ u32 reg_off, reg_mask;
+
+ /*
+ * EOI handling only required only for level-triggered
+ * interrupts in APLIC MSI mode.
+ */
+
+ reg_off = APLIC_CLRIP_BASE + ((d->hwirq / APLIC_IRQBITS_PER_REG) * 4);
+ reg_mask = BIT(d->hwirq % APLIC_IRQBITS_PER_REG);
+ switch (irqd_get_trigger_type(d)) {
+ case IRQ_TYPE_LEVEL_LOW:
+ if (!(readl(priv->regs + reg_off) & reg_mask))
+ writel(d->hwirq, priv->regs + APLIC_SETIPNUM_LE);
+ break;
+ case IRQ_TYPE_LEVEL_HIGH:
+ if (readl(priv->regs + reg_off) & reg_mask)
+ writel(d->hwirq, priv->regs + APLIC_SETIPNUM_LE);
+ break;
+ }
+}
+
+static void aplic_msi_write_msg(struct irq_data *d, struct msi_msg *msg)
+{
+ unsigned int group_index, hart_index, guest_index, val;
+ struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
+ struct aplic_msicfg *mc = &priv->msicfg;
+ phys_addr_t tppn, tbppn, msg_addr;
+ void __iomem *target;
+
+ /* For zeroed MSI, simply write zero into the target register */
+ if (!msg->address_hi && !msg->address_lo && !msg->data) {
+ target = priv->regs + APLIC_TARGET_BASE;
+ target += (d->hwirq - 1) * sizeof(u32);
+ writel(0, target);
+ return;
+ }
+
+ /* Sanity check on message data */
+ WARN_ON(msg->data > APLIC_TARGET_EIID_MASK);
+
+ /* Compute target MSI address */
+ msg_addr = (((u64)msg->address_hi) << 32) | msg->address_lo;
+ tppn = msg_addr >> APLIC_xMSICFGADDR_PPN_SHIFT;
+
+ /* Compute target HART Base PPN */
+ tbppn = tppn;
+ tbppn &= ~APLIC_xMSICFGADDR_PPN_HART(mc->lhxs);
+ tbppn &= ~APLIC_xMSICFGADDR_PPN_LHX(mc->lhxw, mc->lhxs);
+ tbppn &= ~APLIC_xMSICFGADDR_PPN_HHX(mc->hhxw, mc->hhxs);
+ WARN_ON(tbppn != mc->base_ppn);
+
+ /* Compute target group and hart indexes */
+ group_index = (tppn >> APLIC_xMSICFGADDR_PPN_HHX_SHIFT(mc->hhxs)) &
+ APLIC_xMSICFGADDR_PPN_HHX_MASK(mc->hhxw);
+ hart_index = (tppn >> APLIC_xMSICFGADDR_PPN_LHX_SHIFT(mc->lhxs)) &
+ APLIC_xMSICFGADDR_PPN_LHX_MASK(mc->lhxw);
+ hart_index |= (group_index << mc->lhxw);
+ WARN_ON(hart_index > APLIC_TARGET_HART_IDX_MASK);
+
+ /* Compute target guest index */
+ guest_index = tppn & APLIC_xMSICFGADDR_PPN_HART(mc->lhxs);
+ WARN_ON(guest_index > APLIC_TARGET_GUEST_IDX_MASK);
+
+ /* Update IRQ TARGET register */
+ target = priv->regs + APLIC_TARGET_BASE;
+ target += (d->hwirq - 1) * sizeof(u32);
+ val = (hart_index & APLIC_TARGET_HART_IDX_MASK)
+ << APLIC_TARGET_HART_IDX_SHIFT;
+ val |= (guest_index & APLIC_TARGET_GUEST_IDX_MASK)
+ << APLIC_TARGET_GUEST_IDX_SHIFT;
+ val |= (msg->data & APLIC_TARGET_EIID_MASK);
+ writel(val, target);
+}
+
+static void aplic_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
+{
+ arg->desc = desc;
+ arg->hwirq = (u32)desc->data.icookie.value;
+}
+
+static int aplic_msi_translate(struct irq_domain *d, struct irq_fwspec *fwspec,
+ unsigned long *hwirq, unsigned int *type)
+{
+ struct msi_domain_info *info = d->host_data;
+ struct aplic_priv *priv = info->data;
+
+ return aplic_irqdomain_translate(fwspec, priv->gsi_base, hwirq, type);
+}
+
+static const struct msi_domain_template aplic_msi_template = {
+ .chip = {
+ .name = "APLIC-MSI",
+ .irq_mask = aplic_msi_irq_mask,
+ .irq_unmask = aplic_msi_irq_unmask,
+ .irq_set_type = aplic_irq_set_type,
+ .irq_eoi = aplic_msi_irq_eoi,
+#ifdef CONFIG_SMP
+ .irq_set_affinity = irq_chip_set_affinity_parent,
+#endif
+ .irq_write_msi_msg = aplic_msi_write_msg,
+ .flags = IRQCHIP_SET_TYPE_MASKED |
+ IRQCHIP_SKIP_SET_WAKE |
+ IRQCHIP_MASK_ON_SUSPEND,
+ },
+
+ .ops = {
+ .set_desc = aplic_msi_set_desc,
+ .msi_translate = aplic_msi_translate,
+ },
+
+ .info = {
+ .bus_token = DOMAIN_BUS_WIRED_TO_MSI,
+ .flags = MSI_FLAG_USE_DEV_FWNODE,
+ .handler = handle_fasteoi_irq,
+ },
+};
+
+int aplic_msi_setup(struct device *dev, void __iomem *regs)
+{
+ const struct imsic_global_config *imsic_global;
+ struct aplic_priv *priv;
+ struct aplic_msicfg *mc;
+ phys_addr_t pa;
+ int rc;
+
+ priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ rc = aplic_setup_priv(priv, dev, regs);
+ if (rc) {
+ dev_err(dev, "failed to create APLIC context\n");
+ return rc;
+ }
+ mc = &priv->msicfg;
+
+ /*
+ * The APLIC outgoing MSI config registers assume target MSI
+ * controller to be RISC-V AIA IMSIC controller.
+ */
+ imsic_global = imsic_get_global_config();
+ if (!imsic_global) {
+ dev_err(dev, "IMSIC global config not found\n");
+ return -ENODEV;
+ }
+
+ /* Find number of guest index bits (LHXS) */
+ mc->lhxs = imsic_global->guest_index_bits;
+ if (APLIC_xMSICFGADDRH_LHXS_MASK < mc->lhxs) {
+ dev_err(dev, "IMSIC guest index bits big for APLIC LHXS\n");
+ return -EINVAL;
+ }
+
+ /* Find number of HART index bits (LHXW) */
+ mc->lhxw = imsic_global->hart_index_bits;
+ if (APLIC_xMSICFGADDRH_LHXW_MASK < mc->lhxw) {
+ dev_err(dev, "IMSIC hart index bits big for APLIC LHXW\n");
+ return -EINVAL;
+ }
+
+ /* Find number of group index bits (HHXW) */
+ mc->hhxw = imsic_global->group_index_bits;
+ if (APLIC_xMSICFGADDRH_HHXW_MASK < mc->hhxw) {
+ dev_err(dev, "IMSIC group index bits big for APLIC HHXW\n");
+ return -EINVAL;
+ }
+
+ /* Find first bit position of group index (HHXS) */
+ mc->hhxs = imsic_global->group_index_shift;
+ if (mc->hhxs < (2 * APLIC_xMSICFGADDR_PPN_SHIFT)) {
+ dev_err(dev, "IMSIC group index shift should be >= %d\n",
+ (2 * APLIC_xMSICFGADDR_PPN_SHIFT));
+ return -EINVAL;
+ }
+ mc->hhxs -= (2 * APLIC_xMSICFGADDR_PPN_SHIFT);
+ if (APLIC_xMSICFGADDRH_HHXS_MASK < mc->hhxs) {
+ dev_err(dev, "IMSIC group index shift big for APLIC HHXS\n");
+ return -EINVAL;
+ }
+
+ /* Compute PPN base */
+ mc->base_ppn = imsic_global->base_addr >> APLIC_xMSICFGADDR_PPN_SHIFT;
+ mc->base_ppn &= ~APLIC_xMSICFGADDR_PPN_HART(mc->lhxs);
+ mc->base_ppn &= ~APLIC_xMSICFGADDR_PPN_LHX(mc->lhxw, mc->lhxs);
+ mc->base_ppn &= ~APLIC_xMSICFGADDR_PPN_HHX(mc->hhxw, mc->hhxs);
+
+ /* Setup global config and interrupt delivery */
+ aplic_init_hw_global(priv, true);
+
+ /* Set the APLIC device MSI domain if not available */
+ if (!dev_get_msi_domain(dev)) {
+ /*
+ * The device MSI domain for OF devices is only set at the
+ * time of populating/creating OF device. If the device MSI
+ * domain is discovered later after the OF device is created
+ * then we need to set it explicitly before using any platform
+ * MSI functions.
+ *
+ * In case of APLIC device, the parent MSI domain is always
+ * IMSIC and the IMSIC MSI domains are created later through
+ * the platform driver probing so we set it explicitly here.
+ */
+ if (is_of_node(dev->fwnode))
+ of_msi_configure(dev, to_of_node(dev->fwnode));
+ }
+
+ if (!msi_create_device_irq_domain(dev, MSI_DEFAULT_DOMAIN,
+ &aplic_msi_template,
+ priv->nr_irqs + 1, priv, priv)) {
+ dev_err(dev, "failed to create MSI irq domain\n");
+ return -ENOMEM;
+ }
+
+ /* Advertise the interrupt controller */
+ pa = priv->msicfg.base_ppn << APLIC_xMSICFGADDR_PPN_SHIFT;
+ dev_info(dev, "%d interrupts forwared to MSI base %pa\n",
+ priv->nr_irqs, &pa);
+
+ return 0;
+}
--
2.34.1
Hi Thomas,
On Sat, Jan 27, 2024 at 9:48 PM Anup Patel <[email protected]> wrote:
>
> The RISC-V AIA specification is ratified as-per the RISC-V international
> process. The latest ratified AIA specifcation can be found at:
> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>
> At a high-level, the AIA specification adds three things:
> 1) AIA CSRs
> - Improved local interrupt support
> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
> - Per-HART MSI controller
> - Support MSI virtualization
> - Support IPI along with virtualization
> 3) Advanced Platform-Level Interrupt Controller (APLIC)
> - Wired interrupt controller
> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
> - In Direct-mode, injects external interrupts directly into HARTs
>
> For an overview of the AIA specification, refer the AIA virtualization
> talk at KVM Forum 2022:
> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> https://www.youtube.com/watch?v=r071dL8Z0yo
>
> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
>
> These patches can also be found in the riscv_aia_v12 branch at:
> https://github.com/avpatel/linux.git
>
> Changes since v11:
> - Rebased on Linux-6.8-rc1
> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
> MSI handling to per device MSI domains" series by Thomas.
> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
> https://lore.kernel.org/linux-arm-kernel/[email protected]/)
> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
> - Updated IMSIC driver to support per-device MSI domains for PCI and
> platform devices.
>
> Changes since v10:
> - Rebased on Linux-6.6-rc7
> - Dropped PATCH3 of v10 series since this has been merged by MarcZ
> for Linux-6.6-rc7
> - Changed the IMSIC ID management strategy from 1-n approach to
> x86-style 1-1 approach
>
> Changes since v9:
> - Rebased on Linux-6.6-rc4
> - Use builtin_platform_driver() in PATCH5, PATCH9, and PATCH12
>
> Changes since v8:
> - Rebased on Linux-6.6-rc3
> - Dropped PATCH2 of v8 series since we won't be requiring
> riscv_get_intc_hartid() based on Marc Z's comments on ACPI AIA support.
> - Addressed Saravana's comments in PATCH3 of v8 series
> - Update PATCH9 and PATCH13 of v8 series based on comments from Sunil
>
> Changes since v7:
> - Rebased on Linux-6.6-rc1
> - Addressed comments on PATCH1 of v7 series and split it into two PATCHes
> - Use DEFINE_SIMPLE_PROP() in PATCH2 of v7 series
>
> Changes since v6:
> - Rebased on Linux-6.5-rc4
> - Updated PATCH2 to use IS_ENABLED(CONFIG_SPARC) instead of
> !IS_ENABLED(CONFIG_OF_IRQ)
> - Added new PATCH4 to fix syscore registration in PLIC driver
> - Update PATCH5 to convert PLIC driver into full-blown platform driver
> with a re-written probe function.
>
> Changes since v5:
> - Rebased on Linux-6.5-rc2
> - Updated the overall series to ensure that only IPI, timer, and
> INTC drivers are probed very early whereas rest of the interrupt
> controllers (such as PLIC, APLIC, and IMISC) are probed as
> regular platform drivers.
> - Renamed riscv_fw_parent_hartid() to riscv_get_intc_hartid()
> - New PATCH1 to add fw_devlink support for msi-parent DT property
> - New PATCH2 to ensure all INTC suppliers are initialized which in-turn
> fixes the probing issue for PLIC, APLIC and IMSIC as platform driver
> - New PATCH3 to use platform driver probing for PLIC
> - Re-structured the IMSIC driver into two separate drivers: early and
> platform. The IMSIC early driver (PATCH7) only initialized IMSIC state
> and provides IPIs whereas the IMSIC platform driver (PATCH8) is probed
> provides MSI domain for platform devices.
> - Re-structure the APLIC platform driver into three separe sources: main,
> direct mode, and MSI mode.
>
> Changes since v4:
> - Rebased on Linux-6.5-rc1
> - Added "Dependencies" in the APLIC bindings (PATCH6 in v4)
> - Dropped the PATCH6 which was changing the IOMMU DMA domain APIs
> - Dropped use of IOMMU DMA APIs in the IMSIC driver (PATCH4)
>
> Changes since v3:
> - Rebased on Linux-6.4-rc6
> - Droped PATCH2 of v3 series instead we now set FWNODE_FLAG_BEST_EFFORT via
> IRQCHIP_DECLARE()
> - Extend riscv_fw_parent_hartid() to support both DT and ACPI in PATCH1
> - Extend iommu_dma_compose_msi_msg() instead of adding iommu_dma_select_msi()
> in PATCH6
> - Addressed Conor's comments in PATCH3
> - Addressed Conor's and Rob's comments in PATCH7
>
> Changes since v2:
> - Rebased on Linux-6.4-rc1
> - Addressed Rob's comments on DT bindings patches 4 and 8.
> - Addessed Marc's comments on IMSIC driver PATCH5
> - Replaced use of OF apis in APLIC and IMSIC drivers with FWNODE apis
> this makes both drivers easily portable for ACPI support. This also
> removes unnecessary indirection from the APLIC and IMSIC drivers.
> - PATCH1 is a new patch for portability with ACPI support
> - PATCH2 is a new patch to fix probing in APLIC drivers for APLIC-only systems.
> - PATCH7 is a new patch which addresses the IOMMU DMA domain issues pointed
> out by SiFive
>
> Changes since v1:
> - Rebased on Linux-6.2-rc2
> - Addressed comments on IMSIC DT bindings for PATCH4
> - Use raw_spin_lock_irqsave() on ids_lock for PATCH5
> - Improved MMIO alignment checks in PATCH5 to allow MMIO regions
> with holes.
> - Addressed comments on APLIC DT bindings for PATCH6
> - Fixed warning splat in aplic_msi_write_msg() caused by
> zeroed MSI message in PATCH7
> - Dropped DT property riscv,slow-ipi instead will have module
> parameter in future.
>
> Anup Patel (11):
> irqchip/sifive-plic: Convert PLIC driver into a platform driver
> irqchip/riscv-intc: Add support for RISC-V AIA
> dt-bindings: interrupt-controller: Add RISC-V incoming MSI controller
> irqchip: Add RISC-V incoming MSI controller early driver
> irqchip/riscv-imsic: Add device MSI domain support for platform
> devices
> irqchip/riscv-imsic: Add device MSI domain support for PCI devices
> dt-bindings: interrupt-controller: Add RISC-V advanced PLIC
> irqchip: Add RISC-V advanced PLIC driver for direct-mode
> irqchip/riscv-aplic: Add support for MSI-mode
> RISC-V: Select APLIC and IMSIC drivers
> MAINTAINERS: Add entry for RISC-V AIA drivers
>
> Björn Töpel (1):
> genirq/matrix: Dynamic bitmap allocation
>
> Thomas Gleixner (13):
> irqchip/gic-v3: Make gic_irq_domain_select() robust for zero parameter
> count
> genirq/irqdomain: Remove the param count restriction from select()
> genirq/msi: Extend msi_parent_ops
> genirq/irqdomain: Add DOMAIN_BUS_DEVICE_IMS
> platform-msi: Prepare for real per device domains
> irqchip: Convert all platform MSI users to the new API
> genirq/msi: Provide optional translation op
> genirq/msi: Split msi_domain_alloc_irq_at()
> genirq/msi: Provide DOMAIN_BUS_WIRED_TO_MSI
> genirq/msi: Optionally use dev->fwnode for device domain
> genirq/msi: Provide allocation/free functions for "wired" MSI
> interrupts
> genirq/irqdomain: Reroute device MSI create_mapping
> genirq/msi: Provide MSI_FLAG_PARENT_PM_DEV
I have rebased and included 13 patches (which add per-device MSI domain
infrastructure) from your series [1]. In this series, the IMSIC driver
implements the msi_parent_ops and APLIC driver implements wired-to-msi
bridge using your new infrastructure.
The remaining 27 patches of your series [1] requires testing on ARM
platforms which I don't have. I suggest these remaining patches to
go as separate series.
I hope you are okay with this approach.
Best Regards,
Anup
[1] https://lore.kernel.org/linux-arm-kernel/[email protected]/
>
> .../interrupt-controller/riscv,aplic.yaml | 172 ++++
> .../interrupt-controller/riscv,imsics.yaml | 172 ++++
> MAINTAINERS | 14 +
> arch/riscv/Kconfig | 2 +
> arch/x86/include/asm/hw_irq.h | 2 -
> drivers/base/platform-msi.c | 97 ++
> drivers/dma/mv_xor_v2.c | 8 +-
> drivers/dma/qcom/hidma.c | 6 +-
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 5 +-
> drivers/irqchip/Kconfig | 25 +
> drivers/irqchip/Makefile | 3 +
> drivers/irqchip/irq-gic-v3.c | 6 +-
> drivers/irqchip/irq-riscv-aplic-direct.c | 343 +++++++
> drivers/irqchip/irq-riscv-aplic-main.c | 232 +++++
> drivers/irqchip/irq-riscv-aplic-main.h | 53 ++
> drivers/irqchip/irq-riscv-aplic-msi.c | 256 +++++
> drivers/irqchip/irq-riscv-imsic-early.c | 241 +++++
> drivers/irqchip/irq-riscv-imsic-platform.c | 403 ++++++++
> drivers/irqchip/irq-riscv-imsic-state.c | 887 ++++++++++++++++++
> drivers/irqchip/irq-riscv-imsic-state.h | 105 +++
> drivers/irqchip/irq-riscv-intc.c | 34 +-
> drivers/irqchip/irq-sifive-plic.c | 239 +++--
> drivers/mailbox/bcm-flexrm-mailbox.c | 8 +-
> drivers/perf/arm_smmuv3_pmu.c | 4 +-
> drivers/ufs/host/ufs-qcom.c | 8 +-
> include/linux/irqchip/riscv-aplic.h | 119 +++
> include/linux/irqchip/riscv-imsic.h | 87 ++
> include/linux/irqdomain.h | 17 +
> include/linux/irqdomain_defs.h | 2 +
> include/linux/msi.h | 21 +
> kernel/irq/irqdomain.c | 28 +-
> kernel/irq/matrix.c | 28 +-
> kernel/irq/msi.c | 184 +++-
> 33 files changed, 3636 insertions(+), 175 deletions(-)
> create mode 100644 Documentation/devicetree/bindings/interrupt-controller/riscv,aplic.yaml
> create mode 100644 Documentation/devicetree/bindings/interrupt-controller/riscv,imsics.yaml
> create mode 100644 drivers/irqchip/irq-riscv-aplic-direct.c
> create mode 100644 drivers/irqchip/irq-riscv-aplic-main.c
> create mode 100644 drivers/irqchip/irq-riscv-aplic-main.h
> create mode 100644 drivers/irqchip/irq-riscv-aplic-msi.c
> create mode 100644 drivers/irqchip/irq-riscv-imsic-early.c
> create mode 100644 drivers/irqchip/irq-riscv-imsic-platform.c
> create mode 100644 drivers/irqchip/irq-riscv-imsic-state.c
> create mode 100644 drivers/irqchip/irq-riscv-imsic-state.h
> create mode 100644 include/linux/irqchip/riscv-aplic.h
> create mode 100644 include/linux/irqchip/riscv-imsic.h
>
> --
> 2.34.1
>
We add DT bindings document for RISC-V advanced platform level interrupt
controller (APLIC) defined by the RISC-V advanced interrupt architecture
(AIA) specification.
Signed-off-by: Anup Patel <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
---
.../interrupt-controller/riscv,aplic.yaml | 172 ++++++++++++++++++
1 file changed, 172 insertions(+)
create mode 100644 Documentation/devicetree/bindings/interrupt-controller/riscv,aplic.yaml
diff --git a/Documentation/devicetree/bindings/interrupt-controller/riscv,aplic.yaml b/Documentation/devicetree/bindings/interrupt-controller/riscv,aplic.yaml
new file mode 100644
index 000000000000..190a6499c932
--- /dev/null
+++ b/Documentation/devicetree/bindings/interrupt-controller/riscv,aplic.yaml
@@ -0,0 +1,172 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/interrupt-controller/riscv,aplic.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: RISC-V Advanced Platform Level Interrupt Controller (APLIC)
+
+maintainers:
+ - Anup Patel <[email protected]>
+
+description:
+ The RISC-V advanced interrupt architecture (AIA) defines an advanced
+ platform level interrupt controller (APLIC) for handling wired interrupts
+ in a RISC-V platform. The RISC-V AIA specification can be found at
+ https://github.com/riscv/riscv-aia.
+
+ The RISC-V APLIC is implemented as hierarchical APLIC domains where all
+ interrupt sources connect to the root APLIC domain and a parent APLIC
+ domain can delegate interrupt sources to it's child APLIC domains. There
+ is one device tree node for each APLIC domain.
+
+allOf:
+ - $ref: /schemas/interrupt-controller.yaml#
+
+properties:
+ compatible:
+ items:
+ - enum:
+ - qemu,aplic
+ - const: riscv,aplic
+
+ reg:
+ maxItems: 1
+
+ interrupt-controller: true
+
+ "#interrupt-cells":
+ const: 2
+
+ interrupts-extended:
+ minItems: 1
+ maxItems: 16384
+ description:
+ Given APLIC domain directly injects external interrupts to a set of
+ RISC-V HARTS (or CPUs). Each node pointed to should be a riscv,cpu-intc
+ node, which has a CPU node (i.e. RISC-V HART) as parent.
+
+ msi-parent:
+ description:
+ Given APLIC domain forwards wired interrupts as MSIs to a AIA incoming
+ message signaled interrupt controller (IMSIC). If both "msi-parent" and
+ "interrupts-extended" properties are present then it means the APLIC
+ domain supports both MSI mode and Direct mode in HW. In this case, the
+ APLIC driver has to choose between MSI mode or Direct mode.
+
+ riscv,num-sources:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ minimum: 1
+ maximum: 1023
+ description:
+ Specifies the number of wired interrupt sources supported by this
+ APLIC domain.
+
+ riscv,children:
+ $ref: /schemas/types.yaml#/definitions/phandle-array
+ minItems: 1
+ maxItems: 1024
+ items:
+ maxItems: 1
+ description:
+ A list of child APLIC domains for the given APLIC domain. Each child
+ APLIC domain is assigned a child index in increasing order, with the
+ first child APLIC domain assigned child index 0. The APLIC domain child
+ index is used by firmware to delegate interrupts from the given APLIC
+ domain to a particular child APLIC domain.
+
+ riscv,delegation:
+ $ref: /schemas/types.yaml#/definitions/phandle-array
+ minItems: 1
+ maxItems: 1024
+ items:
+ items:
+ - description: child APLIC domain phandle
+ - description: first interrupt number of the parent APLIC domain (inclusive)
+ - description: last interrupt number of the parent APLIC domain (inclusive)
+ description:
+ A interrupt delegation list where each entry is a triple consisting
+ of child APLIC domain phandle, first interrupt number of the parent
+ APLIC domain, and last interrupt number of the parent APLIC domain.
+ Firmware must configure interrupt delegation registers based on
+ interrupt delegation list.
+
+dependencies:
+ riscv,delegation: [ "riscv,children" ]
+
+required:
+ - compatible
+ - reg
+ - interrupt-controller
+ - "#interrupt-cells"
+ - riscv,num-sources
+
+anyOf:
+ - required:
+ - interrupts-extended
+ - required:
+ - msi-parent
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ // Example 1 (APLIC domains directly injecting interrupt to HARTs):
+
+ interrupt-controller@c000000 {
+ compatible = "qemu,aplic", "riscv,aplic";
+ interrupts-extended = <&cpu1_intc 11>,
+ <&cpu2_intc 11>,
+ <&cpu3_intc 11>,
+ <&cpu4_intc 11>;
+ reg = <0xc000000 0x4080>;
+ interrupt-controller;
+ #interrupt-cells = <2>;
+ riscv,num-sources = <63>;
+ riscv,children = <&aplic1>, <&aplic2>;
+ riscv,delegation = <&aplic1 1 63>;
+ };
+
+ aplic1: interrupt-controller@d000000 {
+ compatible = "qemu,aplic", "riscv,aplic";
+ interrupts-extended = <&cpu1_intc 9>,
+ <&cpu2_intc 9>;
+ reg = <0xd000000 0x4080>;
+ interrupt-controller;
+ #interrupt-cells = <2>;
+ riscv,num-sources = <63>;
+ };
+
+ aplic2: interrupt-controller@e000000 {
+ compatible = "qemu,aplic", "riscv,aplic";
+ interrupts-extended = <&cpu3_intc 9>,
+ <&cpu4_intc 9>;
+ reg = <0xe000000 0x4080>;
+ interrupt-controller;
+ #interrupt-cells = <2>;
+ riscv,num-sources = <63>;
+ };
+
+ - |
+ // Example 2 (APLIC domains forwarding interrupts as MSIs):
+
+ interrupt-controller@c000000 {
+ compatible = "qemu,aplic", "riscv,aplic";
+ msi-parent = <&imsic_mlevel>;
+ reg = <0xc000000 0x4000>;
+ interrupt-controller;
+ #interrupt-cells = <2>;
+ riscv,num-sources = <63>;
+ riscv,children = <&aplic3>;
+ riscv,delegation = <&aplic3 1 63>;
+ };
+
+ aplic3: interrupt-controller@d000000 {
+ compatible = "qemu,aplic", "riscv,aplic";
+ msi-parent = <&imsic_slevel>;
+ reg = <0xd000000 0x4000>;
+ interrupt-controller;
+ #interrupt-cells = <2>;
+ riscv,num-sources = <63>;
+ };
+...
--
2.34.1
Anup Patel <[email protected]> writes:
> The RISC-V AIA specification is ratified as-per the RISC-V international
> process. The latest ratified AIA specifcation can be found at:
> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>
> At a high-level, the AIA specification adds three things:
> 1) AIA CSRs
> - Improved local interrupt support
> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
> - Per-HART MSI controller
> - Support MSI virtualization
> - Support IPI along with virtualization
> 3) Advanced Platform-Level Interrupt Controller (APLIC)
> - Wired interrupt controller
> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
> - In Direct-mode, injects external interrupts directly into HARTs
>
> For an overview of the AIA specification, refer the AIA virtualization
> talk at KVM Forum 2022:
> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> https://www.youtube.com/watch?v=r071dL8Z0yo
>
> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
>
> These patches can also be found in the riscv_aia_v12 branch at:
> https://github.com/avpatel/linux.git
>
> Changes since v11:
> - Rebased on Linux-6.8-rc1
> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
> MSI handling to per device MSI domains" series by Thomas.
> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
> https://lore.kernel.org/linux-arm-kernel/[email protected]/)
> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
> - Updated IMSIC driver to support per-device MSI domains for PCI and
> platform devices.
Thanks for working on this, Anup! I'm still reviewing the patches.
I'm hitting a boot hang in text patching, with this series applied on
6.8-rc2. IPI issues?
I'm booting with U-boot UEFI.
kernel config:
https://gist.github.com/bjoto/bac563e6dcaab68dba1a5eaf675d51aa
QEMU 8.2.0/OpenSBI 1.4:
| qemu-system-riscv64 \
| -machine virt,acpi=off,aia=aplic-imsic \
| -cpu rv64,v=true,vlen=256,elen=64,h=true,zbkb=on,zbkc=on,zbkx=on,zkr=on,zkt=on,svinval=on,svnapot=on,svpbmt=on \
| -smp 4 \
| -object rng-random,filename=/dev/urandom,id=rng0 \
| -device virtio-rng-device,rng=rng0 \
| -append "root=/dev/vda2 rw earlycon console=tty0 console=ttyS0 panic=-1 oops=panic sysctl.vm.panic_on_oom=1" \
| -m 4G \
| ...
Last lines from the kernel:
| ...
| goldfish_rtc 101000.rtc: registered as rtc0
| goldfish_rtc 101000.rtc: setting system clock to 2024-01-30T06:39:28 UTC (1706596768)
Same kernel boots w/ "-machine virt,acpi=off" (AIA is *not* enabled).
Related or not, I got this splat (once) a ftrace kselftest:
| # selftests: ftrace: ftracetest-ktap
| Unable to handle kernel paging request at virtual address 5a5a5a5a5a5a5ac2
| Oops [#1]
| Modules linked in: drm fuse i2c_core drm_panel_orientation_quirks backlight dm_mod configfs ip_tables x_tables [last unloaded: trace_printk]
| CPU: 2 PID: 19691 Comm: ls Tainted: G W 6.8.0-rc2-kselftest_plain #1
| Hardware name: riscv-virtio,qemu (DT)
| epc : set_top_events_ownership+0x14/0x5c
| ra : eventfs_get_attr+0x2e/0x50
| epc : ffffffff80533aa4 ra : ffffffff80533b1a sp : ff20000001cebc70
| gp : ffffffff8258b860 tp : ff6000008623e240 t0 : ffffffff80533a98
| t1 : ffffffff825b6b60 t2 : 0000000000000008 s0 : ff20000001cebc80
| s1 : ffffffff8233c000 a0 : ff6000009224e9b8 a1 : ff20000001cebd28
| a2 : ff20000001cebd98 a3 : 000000000000025e a4 : ffffffff80000000
| a5 : 5a5a5a5a5a5a5a5a a6 : 0000000000000000 a7 : 0000000000735049
| s2 : 000000000000025e s3 : ff20000001cebd98 s4 : ff6000009224e9b8
| s5 : ff20000001cebd28 s6 : ffffffffffffff9c s7 : ff6000008ac6a1c0
| s8 : 00007fff9f685d80 s9 : 0000000000000000 s10: 00007fffd4550ef0
| s11: 0000000000000000 t3 : 0000000000000001 t4 : 0000000000000016
| t5 : ffffffff818145be t6 : ff6000009233d77e
| status: 0000000200000120 badaddr: 5a5a5a5a5a5a5ac2 cause: 000000000000000d
| [<ffffffff80533aa4>] set_top_events_ownership+0x14/0x5c
| Code: b297 ffad 82e7 d302 1141 e422 0800 3783 ff85 cb89 (57b8) 8b09
| ---[ end trace 0000000000000000 ]---
This might be unrelated, but the hang above is on every boot.
Björn
Björn Töpel <[email protected]> writes:
> Anup Patel <[email protected]> writes:
>
>> The RISC-V AIA specification is ratified as-per the RISC-V international
>> process. The latest ratified AIA specifcation can be found at:
>> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>>
>> At a high-level, the AIA specification adds three things:
>> 1) AIA CSRs
>> - Improved local interrupt support
>> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
>> - Per-HART MSI controller
>> - Support MSI virtualization
>> - Support IPI along with virtualization
>> 3) Advanced Platform-Level Interrupt Controller (APLIC)
>> - Wired interrupt controller
>> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
>> - In Direct-mode, injects external interrupts directly into HARTs
>>
>> For an overview of the AIA specification, refer the AIA virtualization
>> talk at KVM Forum 2022:
>> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
>> https://www.youtube.com/watch?v=r071dL8Z0yo
>>
>> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
>>
>> These patches can also be found in the riscv_aia_v12 branch at:
>> https://github.com/avpatel/linux.git
>>
>> Changes since v11:
>> - Rebased on Linux-6.8-rc1
>> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
>> MSI handling to per device MSI domains" series by Thomas.
>> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
>> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
>> https://lore.kernel.org/linux-arm-kernel/[email protected]/)
>> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
>> - Updated IMSIC driver to support per-device MSI domains for PCI and
>> platform devices.
>
> Thanks for working on this, Anup! I'm still reviewing the patches.
>
> I'm hitting a boot hang in text patching, with this series applied on
> 6.8-rc2. IPI issues?
Not text patching! One cpu spinning in smp_call_function_many_cond() and
the others are in cpu_relax(). Smells like IPI...
On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernelorg> wrote:
>
> Björn Töpel <[email protected]> writes:
>
> > Anup Patel <[email protected]> writes:
> >
> >> The RISC-V AIA specification is ratified as-per the RISC-V international
> >> process. The latest ratified AIA specifcation can be found at:
> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> >>
> >> At a high-level, the AIA specification adds three things:
> >> 1) AIA CSRs
> >> - Improved local interrupt support
> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
> >> - Per-HART MSI controller
> >> - Support MSI virtualization
> >> - Support IPI along with virtualization
> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
> >> - Wired interrupt controller
> >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
> >> - In Direct-mode, injects external interrupts directly into HARTs
> >>
> >> For an overview of the AIA specification, refer the AIA virtualization
> >> talk at KVM Forum 2022:
> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> >> https://www.youtube.com/watch?v=r071dL8Z0yo
> >>
> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
> >>
> >> These patches can also be found in the riscv_aia_v12 branch at:
> >> https://github.com/avpatel/linux.git
> >>
> >> Changes since v11:
> >> - Rebased on Linux-6.8-rc1
> >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
> >> MSI handling to per device MSI domains" series by Thomas.
> >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
> >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
> >> https://lore.kernel.org/linux-arm-kernel/[email protected]/)
> >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
> >> - Updated IMSIC driver to support per-device MSI domains for PCI and
> >> platform devices.
> >
> > Thanks for working on this, Anup! I'm still reviewing the patches.
> >
> > I'm hitting a boot hang in text patching, with this series applied on
> > 6.8-rc2. IPI issues?
>
> Not text patching! One cpu spinning in smp_call_function_many_cond() and
> the others are in cpu_relax(). Smells like IPI...
Can you share the complete bootlog ?
Regards,
Anup
On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernelorg> wrote:
>
> Björn Töpel <[email protected]> writes:
>
> > Anup Patel <[email protected]> writes:
> >
> >> The RISC-V AIA specification is ratified as-per the RISC-V international
> >> process. The latest ratified AIA specifcation can be found at:
> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> >>
> >> At a high-level, the AIA specification adds three things:
> >> 1) AIA CSRs
> >> - Improved local interrupt support
> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
> >> - Per-HART MSI controller
> >> - Support MSI virtualization
> >> - Support IPI along with virtualization
> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
> >> - Wired interrupt controller
> >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
> >> - In Direct-mode, injects external interrupts directly into HARTs
> >>
> >> For an overview of the AIA specification, refer the AIA virtualization
> >> talk at KVM Forum 2022:
> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> >> https://www.youtube.com/watch?v=r071dL8Z0yo
> >>
> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
> >>
> >> These patches can also be found in the riscv_aia_v12 branch at:
> >> https://github.com/avpatel/linux.git
> >>
> >> Changes since v11:
> >> - Rebased on Linux-6.8-rc1
> >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
> >> MSI handling to per device MSI domains" series by Thomas.
> >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
> >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
> >> https://lore.kernel.org/linux-arm-kernel/[email protected]/)
> >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
> >> - Updated IMSIC driver to support per-device MSI domains for PCI and
> >> platform devices.
> >
> > Thanks for working on this, Anup! I'm still reviewing the patches.
> >
> > I'm hitting a boot hang in text patching, with this series applied on
> > 6.8-rc2. IPI issues?
>
> Not text patching! One cpu spinning in smp_call_function_many_cond() and
> the others are in cpu_relax(). Smells like IPI...
I tried bootefi from U-Boot multiple times but can't reproduce the
issue you are seeing.
Here's my boot log ...
$ qemu-system-riscv64 -M virt,aia=aplic-imsic -m 256M -display none
-serial stdio -bios
opensbi/build/platform/generic/firmware/fw_jump.bin -kernel
/u-boot/u-boot.bin -device
loader,file=./build-riscv64/arch/riscv/boot/Image,addr=0x84000000
-drive file=./rootfs_riscv64.ext2,format=raw,id=hd0 -device
virtio-blk-device,drive=hd0 -device virtio-net-device,netdev=eth0
-netdev user,id=eth0 -object rng-random,filename=/dev/urandom,id=rng0
-device virtio-rng-device,rng=rng0 -append "root=/dev/vda rootwait rw
console=ttyS0 earlycon" -smp 8
OpenSBI v1.4-8-gbb90a9e
____ _____ ____ _____
/ __ \ / ____| _ \_ _|
| | | |_ __ ___ _ __ | (___ | |_) || |
| | | | '_ \ / _ \ '_ \ \___ \| _ < | |
| |__| | |_) | __/ | | |____) | |_) || |_
\____/| .__/ \___|_| |_|_____/|____/_____|
| |
|_|
Platform Name : riscv-virtio,qemu
Platform Features : medeleg
Platform HART Count : 8
Platform IPI Device : aia-imsic
Platform Timer Device : aclint-mtimer @ 10000000Hz
Platform Console Device : uart8250
Platform HSM Device : ---
Platform PMU Device : ---
Platform Reboot Device : syscon-reboot
Platform Shutdown Device : syscon-poweroff
Platform Suspend Device : ---
Platform CPPC Device : ---
Firmware Base : 0x80000000
Firmware Size : 395 KB
Firmware RW Offset : 0x40000
Firmware RW Size : 139 KB
Firmware Heap Offset : 0x56000
Firmware Heap Size : 51 KB (total), 3 KB (reserved), 12 KB
(used), 36 KB (free)
Firmware Scratch Size : 4096 B (total), 328 B (used), 3768 B (free)
Runtime SBI Version : 2.0
Domain0 Name : root
Domain0 Boot HART : 7
Domain0 HARTs : 0*,1*,2*,3*,4*,5*,6*,7*
Domain0 Region00 : 0x0000000000100000-0x0000000000100fff M:
(I,R,W) S/U: (R,W)
Domain0 Region01 : 0x0000000010000000-0x0000000010000fff M:
(I,R,W) S/U: (R,W)
Domain0 Region02 : 0x000000000c000000-0x000000000c007fff M:
(I,R,W) S/U: ()
Domain0 Region03 : 0x0000000024000000-0x0000000024007fff M:
(I,R,W) S/U: ()
Domain0 Region04 : 0x0000000002000000-0x000000000200ffff M:
(I,R,W) S/U: ()
Domain0 Region05 : 0x0000000080000000-0x000000008003ffff M:
(R,X) S/U: ()
Domain0 Region06 : 0x0000000080040000-0x000000008007ffff M:
(R,W) S/U: ()
Domain0 Region07 : 0x0000000000000000-0xffffffffffffffff M:
() S/U: (R,W,X)
Domain0 Next Address : 0x0000000080200000
Domain0 Next Arg1 : 0x0000000082200000
Domain0 Next Mode : S-mode
Domain0 SysReset : yes
Domain0 SysSuspend : yes
Boot HART ID : 7
Boot HART Domain : root
Boot HART Priv Version : v1.12
Boot HART Base ISA : rv64imafdch
Boot HART ISA Extensions : smaia,sstc,zicntr,zihpm,zicboz,zicbom,sdtrig
Boot HART PMP Count : 16
Boot HART PMP Granularity : 2 bits
Boot HART PMP Address Bits: 54
Boot HART MHPM Info : 16 (0x0007fff8)
Boot HART Debug Triggers : 2 triggers
Boot HART MIDELEG : 0x0000000000001666
Boot HART MEDELEG : 0x0000000000f0b509
U-Boot 2023.10 (Nov 07 2023 - 18:28:29 +0530)
CPU: rv64imafdch_zicbom_zicboz_zicntr_zicsr_zifencei_zihintntl_zihintpause_zihpm_zawrs_zfa_zca_zcd_zba_zbb_zbc_zbs_smaia_ssaia_sstc_svadu
Model: riscv-virtio,qemu
DRAM: 256 MiB
Core: 37 devices, 16 uclasses, devicetree: board
Flash: 32 MiB
Loading Environment from nowhere... OK
In: serial,usbkbd
Out: serial,vidconsole
Err: serial,vidconsole
No working controllers found
Net: eth0: virtio-net#1
Working FDT set to 8ef1f870
Hit any key to stop autoboot: 0
=> bootefi ${kernel_addr_r}:0x1600000 ${fdtcontroladdr}
No EFI system partition
No EFI system partition
Failed to persist EFI variables
Booting /MemoryMapped(0x0,0x84000000,0x1600000)
EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services...
[ 0.000000] Linux version 6.8.0-rc1-00039-gd9b9d6eb987f
(anup@anup-ubuntu-vm) (riscv64-unknown-linux-gnu-gcc (g2ee5e430018)
12.2.0, GNU ld (GNU Binutils) 2.40.0.20230214) #67 SMP Sat Jan 27
17:20:09 IST 2024
[ 0.000000] random: crng init done
[ 0.000000] Machine model: riscv-virtio,qemu
[ 0.000000] SBI specification v2.0 detected
[ 0.000000] SBI implementation ID=0x1 Version=0x10004
[ 0.000000] SBI TIME extension detected
[ 0.000000] SBI IPI extension detected
[ 0.000000] SBI RFENCE extension detected
[ 0.000000] SBI SRST extension detected
[ 0.000000] SBI DBCN extension detected
[ 0.000000] earlycon: ns16550a0 at MMIO 0x0000000010000000 (options '')
[ 0.000000] printk: legacy bootconsole [ns16550a0] enabled
[ 0.000000] efi: EFI v2.10 by Das U-Boot
[ 0.000000] efi: RTPROP=0x8df05040 SMBIOS=0x8df01000 RNG=0x8c972040
MEMRESERVE=0x8c971040
[ 0.000000] OF: reserved mem:
0x0000000080000000..0x000000008003ffff (256 KiB) nomap non-reusable
mmode_resv0@80000000
[ 0.000000] OF: reserved mem:
0x0000000080040000..0x000000008007ffff (256 KiB) nomap non-reusable
mmode_resv1@80040000
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000080000000-0x000000008fffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000080000000-0x000000008007ffff]
[ 0.000000] node 0: [mem 0x0000000080080000-0x000000008df00fff]
[ 0.000000] node 0: [mem 0x000000008df01000-0x000000008df01fff]
[ 0.000000] node 0: [mem 0x000000008df02000-0x000000008df04fff]
[ 0.000000] node 0: [mem 0x000000008df05000-0x000000008df05fff]
[ 0.000000] node 0: [mem 0x000000008df06000-0x000000008df06fff]
[ 0.000000] node 0: [mem 0x000000008df07000-0x000000008df08fff]
[ 0.000000] node 0: [mem 0x000000008df09000-0x000000008df09fff]
[ 0.000000] node 0: [mem 0x000000008df0a000-0x000000008df19fff]
[ 0.000000] node 0: [mem 0x000000008df1a000-0x000000008f741fff]
[ 0.000000] node 0: [mem 0x000000008f742000-0x000000008f742fff]
[ 0.000000] node 0: [mem 0x000000008f743000-0x000000008fffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x000000008fffffff]
[ 0.000000] SBI HSM extension detected
[ 0.000000] Falling back to deprecated "riscv,isa"
[ 0.000000] riscv: base ISA extensions acdfhim
[ 0.000000] riscv: ELF capabilities acdfim
[ 0.000000] percpu: Embedded 20 pages/cpu s41464 r8192 d32264 u81920
[ 0.000000] Kernel command line: root=/dev/vda rootwait rw
console=ttyS0 earlycon
[ 0.000000] Dentry cache hash table entries: 32768 (order: 6,
262144 bytes, linear)
[ 0.000000] Inode-cache hash table entries: 16384 (order: 5, 131072
bytes, linear)
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 64512
[ 0.000000] mem auto-init: stack:all(zero), heap alloc:off, heap free:off
[ 0.000000] Virtual kernel memory layout:
[ 0.000000] fixmap : 0xff1bfffffea00000 - 0xff1bffffff000000
(6144 kB)
[ 0.000000] pci io : 0xff1bffffff000000 - 0xff1c000000000000
( 16 MB)
[ 0.000000] vmemmap : 0xff1c000000000000 - 0xff20000000000000
(1024 TB)
[ 0.000000] vmalloc : 0xff20000000000000 - 0xff60000000000000
(16384 TB)
[ 0.000000] modules : 0xffffffff01582000 - 0xffffffff80000000
(2026 MB)
[ 0.000000] lowmem : 0xff60000000000000 - 0xff60000010000000
( 256 MB)
[ 0.000000] kernel : 0xffffffff80000000 - 0xffffffffffffffff
(2047 MB)
[ 0.000000] Memory: 217364K/262144K available (9190K kernel code,
4939K rwdata, 4096K rodata, 2252K init, 489K bss, 44780K reserved, 0K
cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
[ 0.000000] rcu: Hierarchical RCU implementation.
[ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=8.
[ 0.000000] rcu: RCU debug extended QS entry/exit.
[ 0.000000] Tracing variant of Tasks RCU enabled.
[ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay
is 25 jiffies.
[ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] riscv-intc: 64 local interrupts mapped using AIA
[ 0.000000] riscv-imsic: imsics@28000000: providing IPIs using interrupt 1
[ 0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[ 0.000000] clocksource: riscv_clocksource: mask:
0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120
ns
[ 0.000087] sched_clock: 64 bits at 10MHz, resolution 100ns, wraps
every 4398046511100ns
[ 0.001406] riscv-timer: Timer interrupt in S-mode is available via
sstc extension
[ 0.007310] Console: colour dummy device 80x25
[ 0.014343] Calibrating delay loop (skipped), value calculated
using timer frequency.. 20.00 BogoMIPS (lpj=40000)
[ 0.018315] pid_max: default: 32768 minimum: 301
[ 0.020982] LSM: initializing lsm=capability,integrity
[ 0.023969] Mount-cache hash table entries: 512 (order: 0, 4096
bytes, linear)
[ 0.025231] Mountpoint-cache hash table entries: 512 (order: 0,
4096 bytes, linear)
[ 0.066845] RCU Tasks Trace: Setting shift to 3 and lim to 1
rcu_task_cb_adjust=1.
[ 0.068829] riscv: ELF compat mode supported
[ 0.069115] ASID allocator using 16 bits (65536 entries)
[ 0.080952] rcu: Hierarchical SRCU implementation.
[ 0.081712] rcu: Max phase no-delay instances is 1000.
[ 0.086381] Remapping and enabling EFI services.
[ 0.093736] smp: Bringing up secondary CPUs ...
[ 0.162264] smp: Brought up 1 node, 8 CPUs
[ 0.186107] devtmpfs: initialized
[ 0.199725] clocksource: jiffies: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 7645041785100000 ns
[ 0.200634] futex hash table entries: 2048 (order: 5, 131072 bytes, linear)
[ 0.203482] pinctrl core: initialized pinctrl subsystem
[ 0.213664] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[ 0.218255] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
[ 0.221185] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for
atomic allocations
[ 0.222099] audit: initializing netlink subsys (disabled)
[ 0.228028] audit: type=2000 audit(0.168:1): state=initialized
audit_enabled=0 res=1
[ 0.230906] cpuidle: using governor menu
[ 0.289647] cpu2: Ratio of byte access time to unaligned word
access is 7.20, unaligned accesses are fast
[ 0.289661] cpu3: Ratio of byte access time to unaligned word
access is 5.94, unaligned accesses are fast
[ 0.289652] cpu4: Ratio of byte access time to unaligned word
access is 7.13, unaligned accesses are fast
[ 0.289625] cpu6: Ratio of byte access time to unaligned word
access is 10.28, unaligned accesses are fast
[ 0.289615] cpu1: Ratio of byte access time to unaligned word
access is 8.04, unaligned accesses are fast
[ 0.290252] cpu5: Ratio of byte access time to unaligned word
access is 7.13, unaligned accesses are fast
[ 0.299499] cpu7: Ratio of byte access time to unaligned word
access is 6.58, unaligned accesses are fast
[ 0.326695] cpu0: Ratio of byte access time to unaligned word
access is 7.78, unaligned accesses are fast
[ 0.354371] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
[ 0.354767] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page
[ 0.361699] ACPI: Interpreter disabled.
[ 0.363441] iommu: Default domain type: Translated
[ 0.364215] iommu: DMA domain TLB invalidation policy: strict mode
[ 0.368128] SCSI subsystem initialized
[ 0.371067] usbcore: registered new interface driver usbfs
[ 0.371912] usbcore: registered new interface driver hub
[ 0.372389] usbcore: registered new device driver usb
[ 0.375075] efivars: Registered efivars operations
[ 0.389652] vgaarb: loaded
[ 0.443368] clocksource: Switched to clocksource riscv_clocksource
[ 0.449125] pnp: PnP ACPI: disabled
[ 0.499449] NET: Registered PF_INET protocol family
[ 0.500979] IP idents hash table entries: 4096 (order: 3, 32768
bytes, linear)
[ 0.507062] tcp_listen_portaddr_hash hash table entries: 128
(order: 0, 4096 bytes, linear)
[ 0.507775] Table-perturb hash table entries: 65536 (order: 6,
262144 bytes, linear)
[ 0.508351] TCP established hash table entries: 2048 (order: 2,
16384 bytes, linear)
[ 0.508930] TCP bind hash table entries: 2048 (order: 5, 131072
bytes, linear)
[ 0.509942] TCP: Hash tables configured (established 2048 bind 2048)
[ 0.511459] UDP hash table entries: 256 (order: 2, 24576 bytes, linear)
[ 0.512262] UDP-Lite hash table entries: 256 (order: 2, 24576 bytes, linear)
[ 0.514937] NET: Registered PF_UNIX/PF_LOCAL protocol family
[ 0.521225] RPC: Registered named UNIX socket transport module.
[ 0.521913] RPC: Registered udp transport module.
[ 0.522324] RPC: Registered tcp transport module.
[ 0.522656] RPC: Registered tcp-with-tls transport module.
[ 0.523178] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 0.523787] PCI: CLS 0 bytes, default 64
[ 0.529358] workingset: timestamp_bits=46 max_order=16 bucket_order=0
[ 0.537946] NFS: Registering the id_resolver key type
[ 0.539478] Key type id_resolver registered
[ 0.539918] Key type id_legacy registered
[ 0.540656] nfs4filelayout_init: NFSv4 File Layout Driver Registering...
[ 0.542911] nfs4flexfilelayout_init: NFSv4 Flexfile Layout Driver
Registering...
[ 0.544894] 9p: Installing v9fs 9p2000 file system support
[ 0.548517] NET: Registered PF_ALG protocol family
[ 0.549459] Block layer SCSI generic (bsg) driver version 0.4
loaded (major 245)
[ 0.550658] io scheduler mq-deadline registered
[ 0.552112] io scheduler kyber registered
[ 0.552442] io scheduler bfq registered
[ 0.556517] riscv-imsic: imsics@28000000: hart-index-bits: 3,
guest-index-bits: 0
[ 0.556955] riscv-imsic: imsics@28000000: group-index-bits: 0,
group-index-shift: 24
[ 0.557403] riscv-imsic: imsics@28000000: per-CPU IDs 255 at base
PPN 0x0000000028000000
[ 0.557699] riscv-imsic: imsics@28000000: total 2032 interrupts available
[ 0.561962] pci-host-generic 30000000.pci: host bridge
/soc/pci@30000000 ranges:
[ 0.563422] pci-host-generic 30000000.pci: IO
0x0003000000..0x000300ffff -> 0x0000000000
[ 0.564475] pci-host-generic 30000000.pci: MEM
0x0040000000..0x007fffffff -> 0x0040000000
[ 0.565013] pci-host-generic 30000000.pci: MEM
0x0400000000..0x07ffffffff -> 0x0400000000
[ 0.566349] pci-host-generic 30000000.pci: Memory resource size
exceeds max for 32 bits
[ 0.567633] pci-host-generic 30000000.pci: ECAM at [mem
0x30000000-0x3fffffff] for [bus 00-ff]
[ 0.569300] pci-host-generic 30000000.pci: PCI host bridge to bus 0000:00
[ 0.570172] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 0.570559] pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
[ 0.570969] pci_bus 0000:00: root bus resource [mem 0x40000000-0x7fffffff]
[ 0.571595] pci_bus 0000:00: root bus resource [mem 0x400000000-0x7ffffffff]
[ 0.573646] pci 0000:00:00.0: [1b36:0008] type 00 class 0x060000
conventional PCI endpoint
[ 0.654069] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[ 0.659475] SuperH (H)SCI(F) driver initialized
[ 0.675004] loop: module loaded
[ 0.680024] e1000e: Intel(R) PRO/1000 Network Driver
[ 0.680162] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[ 0.684590] usbcore: registered new interface driver uas
[ 0.685245] usbcore: registered new interface driver usb-storage
[ 0.686530] mousedev: PS/2 mouse device common for all mice
[ 0.693125] syscon-poweroff poweroff: pm_power_off already claimed
for sbi_srst_power_off
[ 0.694774] syscon-poweroff: probe of poweroff failed with error -16
[ 0.698092] sdhci: Secure Digital Host Controller Interface driver
[ 0.698484] sdhci: Copyright(c) Pierre Ossman
[ 0.699333] Synopsys Designware Multimedia Card Interface Driver
[ 0.699869] sdhci-pltfm: SDHCI platform and OF driver helper
[ 0.700673] usbcore: registered new interface driver usbhid
[ 0.700920] usbhid: USB HID core driver
[ 0.701501] riscv-pmu-sbi: SBI PMU extension is available
[ 0.702263] riscv-pmu-sbi: 16 firmware and 18 hardware counters
[ 0.702618] riscv-pmu-sbi: Perf sampling/filtering is not supported
as sscof extension is not available
[ 0.709934] NET: Registered PF_INET6 protocol family
[ 0.723647] Segment Routing with IPv6
[ 0.724210] In-situ OAM (IOAM) with IPv6
[ 0.724882] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[ 0.727936] NET: Registered PF_PACKET protocol family
[ 0.729289] 9pnet: Installing 9P2000 support
[ 0.729796] Key type dns_resolver registered
[ 0.763394] debug_vm_pgtable: [debug_vm_pgtable ]:
Validating architecture page table helpers
[ 0.772054] riscv-aplic d000000.aplic: 96 interrupts forwared to
MSI base 0x0000000028000000
[ 0.775578] virtio_blk virtio0: 1/0/0 default/read/poll queues
[ 0.782792] virtio_blk virtio0: [vda] 65536 512-byte logical blocks
(33.6 MB/32.0 MiB)
[ 0.827635] printk: legacy console [ttyS0] disabled
[ 0.830784] 10000000.serial: ttyS0 at MMIO 0x10000000 (irq = 14,
base_baud = 230400) is a 16550A
[ 0.833076] printk: legacy console [ttyS0] enabled
[ 0.833076] printk: legacy console [ttyS0] enabled
[ 0.833856] printk: legacy bootconsole [ns16550a0] disabled
[ 0.833856] printk: legacy bootconsole [ns16550a0] disabled
[ 0.843499] goldfish_rtc 101000.rtc: registered as rtc0
[ 0.844980] goldfish_rtc 101000.rtc: setting system clock to
2024-01-30T10:19:33 UTC (1706609973)
[ 0.848495] clk: Disabling unused clocks
[ 0.884046] EXT4-fs (vda): warning: mounting unchecked fs, running
e2fsck is recommended
[ 0.891369] EXT4-fs (vda): mounted filesystem
00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode:
disabled.
[ 0.892199] ext4 filesystem being mounted at /root supports
timestamps until 2038-01-19 (0x7fffffff)
[ 0.892644] VFS: Mounted root (ext4 filesystem) on device 254:0.
[ 0.895564] devtmpfs: mounted
[ 0.986847] Freeing unused kernel image (initmem) memory: 2252K
[ 0.988406] Run /sbin/init as init process
mount: mounting devtmpfs on /dev failed: Device or resource busy
_ _
| ||_|
| | _ ____ _ _ _ _
| || | _ \| | | |\ \/ /
| || | | | | |_| |/ \
|_||_|_| |_|\____|\_/\_/
Busybox Rootfs
Please press Enter to activate this console.
/ #
/ # cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
CPU6 CPU7
10: 103 116 58 214 96 47
78 52 RISC-V INTC 5 Edge riscv-timer
11: 0 44 0 0 0 0
0 0 APLIC-MSI-d000000.aplic 8 Level virtio0
12: 0 0 0 0 0 0
0 0 APLIC-MSI-d000000.aplic 7 Level virtio1
13: 0 0 0 6 0 0
0 0 APLIC-MSI-d000000.aplic 6 Level virtio2
14: 0 0 0 0 64 0
0 0 APLIC-MSI-d000000.aplic 10 Level ttyS0
15: 0 0 0 0 0 0
0 0 APLIC-MSI-d000000.aplic 11 Level
101000.rtc
IPI0: 4 9 12 6 5 10
13 7 Rescheduling interrupts
IPI1: 605 477 442 315 392 434
405 417 Function call interrupts
IPI2: 0 0 0 0 0 0
0 0 CPU stop interrupts
IPI3: 0 0 0 0 0 0
0 0 CPU stop (for crash dump) interrupts
IPI4: 0 0 0 0 0 0
0 0 IRQ work interrupts
IPI5: 0 0 0 0 0 0
0 0 Timer broadcast interrupts
/ #
/ #
/ # poweroff
/ # umount: devtmpfs busy - remounted read-only
[ 24.504316] EXT4-fs (vda): re-mounted
00000000-0000-0000-0000-000000000000 ro. Quota mode: disabled.
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to[ 26.543142] reboot: Power down
Regards,
Anup
Anup Patel <[email protected]> writes:
> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <[email protected]> wrote:
>>
>> Björn Töpel <[email protected]> writes:
>>
>> > Anup Patel <[email protected]> writes:
>> >
>> >> The RISC-V AIA specification is ratified as-per the RISC-V international
>> >> process. The latest ratified AIA specifcation can be found at:
>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>> >>
>> >> At a high-level, the AIA specification adds three things:
>> >> 1) AIA CSRs
>> >> - Improved local interrupt support
>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
>> >> - Per-HART MSI controller
>> >> - Support MSI virtualization
>> >> - Support IPI along with virtualization
>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
>> >> - Wired interrupt controller
>> >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
>> >> - In Direct-mode, injects external interrupts directly into HARTs
>> >>
>> >> For an overview of the AIA specification, refer the AIA virtualization
>> >> talk at KVM Forum 2022:
>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
>> >> https://www.youtube.com/watch?v=r071dL8Z0yo
>> >>
>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
>> >>
>> >> These patches can also be found in the riscv_aia_v12 branch at:
>> >> https://github.com/avpatel/linux.git
>> >>
>> >> Changes since v11:
>> >> - Rebased on Linux-6.8-rc1
>> >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
>> >> MSI handling to per device MSI domains" series by Thomas.
>> >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
>> >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
>> >> https://lore.kernel.org/linux-arm-kernel/[email protected]/)
>> >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
>> >> - Updated IMSIC driver to support per-device MSI domains for PCI and
>> >> platform devices.
>> >
>> > Thanks for working on this, Anup! I'm still reviewing the patches.
>> >
>> > I'm hitting a boot hang in text patching, with this series applied on
>> > 6.8-rc2. IPI issues?
>>
>> Not text patching! One cpu spinning in smp_call_function_many_cond() and
>> the others are in cpu_relax(). Smells like IPI...
>
> Can you share the complete bootlog ?
Here: https://gist.github.com/bjoto/04a580568378f3b5483af07cd9d22501
Anup Patel <[email protected]> writes:
> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <[email protected]> wrote:
>>
>> Björn Töpel <[email protected]> writes:
>>
>> > Anup Patel <[email protected]> writes:
>> >
>> >> The RISC-V AIA specification is ratified as-per the RISC-V international
>> >> process. The latest ratified AIA specifcation can be found at:
>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>> >>
>> >> At a high-level, the AIA specification adds three things:
>> >> 1) AIA CSRs
>> >> - Improved local interrupt support
>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
>> >> - Per-HART MSI controller
>> >> - Support MSI virtualization
>> >> - Support IPI along with virtualization
>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
>> >> - Wired interrupt controller
>> >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
>> >> - In Direct-mode, injects external interrupts directly into HARTs
>> >>
>> >> For an overview of the AIA specification, refer the AIA virtualization
>> >> talk at KVM Forum 2022:
>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
>> >> https://www.youtube.com/watch?v=r071dL8Z0yo
>> >>
>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
>> >>
>> >> These patches can also be found in the riscv_aia_v12 branch at:
>> >> https://github.com/avpatel/linux.git
>> >>
>> >> Changes since v11:
>> >> - Rebased on Linux-6.8-rc1
>> >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
>> >> MSI handling to per device MSI domains" series by Thomas.
>> >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
>> >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
>> >> https://lore.kernel.org/linux-arm-kernel/[email protected]/)
>> >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
>> >> - Updated IMSIC driver to support per-device MSI domains for PCI and
>> >> platform devices.
>> >
>> > Thanks for working on this, Anup! I'm still reviewing the patches.
>> >
>> > I'm hitting a boot hang in text patching, with this series applied on
>> > 6.8-rc2. IPI issues?
>>
>> Not text patching! One cpu spinning in smp_call_function_many_cond() and
>> the others are in cpu_relax(). Smells like IPI...
>
> I tried bootefi from U-Boot multiple times but can't reproduce the
> issue you are seeing.
Thanks! I can reproduce without EFI, and simpler command-line:
qemu-system-riscv64 \
-bios /path/to/fw_dynamic.bin \
-kernel /path/to/Image \
-append 'earlycon console=tty0 console=ttyS0' \
-machine virt,aia=aplic-imsic \
-no-reboot -nodefaults -nographic \
-smp 4 \
-object rng-random,filename=/dev/urandom,id=rng0 \
-device virtio-rng-device,rng=rng0 \
-m 4G -chardev stdio,id=char0 -serial chardev:char0
I can reproduce with your upstream riscv_aia_v12 plus the config in the
gist [1], and all latest QEMU/OpenSBI:
QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging")
OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found")
Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers")
Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e.
panicking about missing root mount ;-))
Björn
[1] https://gist.githubusercontent.com/bjoto/bac563e6dcaab68dba1a5eaf675d51aa/raw/ff6208fb17f27819dbe97ace7d034f385d2db657/gistfile1.txt
Björn Töpel <[email protected]> writes:
> Anup Patel <[email protected]> writes:
>
>> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <[email protected]> wrote:
>>>
>>> Björn Töpel <[email protected]> writes:
>>>
>>> > Anup Patel <[email protected]> writes:
>>> >
>>> >> The RISC-V AIA specification is ratified as-per the RISC-V international
>>> >> process. The latest ratified AIA specifcation can be found at:
>>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>>> >>
>>> >> At a high-level, the AIA specification adds three things:
>>> >> 1) AIA CSRs
>>> >> - Improved local interrupt support
>>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
>>> >> - Per-HART MSI controller
>>> >> - Support MSI virtualization
>>> >> - Support IPI along with virtualization
>>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
>>> >> - Wired interrupt controller
>>> >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
>>> >> - In Direct-mode, injects external interrupts directly into HARTs
>>> >>
>>> >> For an overview of the AIA specification, refer the AIA virtualization
>>> >> talk at KVM Forum 2022:
>>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
>>> >> https://www.youtube.com/watch?v=r071dL8Z0yo
>>> >>
>>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
>>> >>
>>> >> These patches can also be found in the riscv_aia_v12 branch at:
>>> >> https://github.com/avpatel/linux.git
>>> >>
>>> >> Changes since v11:
>>> >> - Rebased on Linux-6.8-rc1
>>> >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
>>> >> MSI handling to per device MSI domains" series by Thomas.
>>> >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
>>> >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
>>> >> https://lore.kernel.org/linux-arm-kernel/[email protected]/)
>>> >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
>>> >> - Updated IMSIC driver to support per-device MSI domains for PCI and
>>> >> platform devices.
>>> >
>>> > Thanks for working on this, Anup! I'm still reviewing the patches.
>>> >
>>> > I'm hitting a boot hang in text patching, with this series applied on
>>> > 6.8-rc2. IPI issues?
>>>
>>> Not text patching! One cpu spinning in smp_call_function_many_cond() and
>>> the others are in cpu_relax(). Smells like IPI...
>>
>> I tried bootefi from U-Boot multiple times but can't reproduce the
>> issue you are seeing.
>
> Thanks! I can reproduce without EFI, and simpler command-line:
>
> qemu-system-riscv64 \
> -bios /path/to/fw_dynamic.bin \
> -kernel /path/to/Image \
> -append 'earlycon console=tty0 console=ttyS0' \
> -machine virt,aia=aplic-imsic \
> -no-reboot -nodefaults -nographic \
> -smp 4 \
> -object rng-random,filename=/dev/urandom,id=rng0 \
> -device virtio-rng-device,rng=rng0 \
> -m 4G -chardev stdio,id=char0 -serial chardev:char0
>
> I can reproduce with your upstream riscv_aia_v12 plus the config in the
> gist [1], and all latest QEMU/OpenSBI:
>
> QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging")
> OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found")
> Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers")
>
> Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e.
> panicking about missing root mount ;-))
More context; The hang is during a late initcall, where an ftrace direct
(register_ftrace_direct()) modification is done.
Stop machine is used to call into __ftrace_modify_call(). Then into the
arch specific patch_text_nosync(), where flush_icache_range() hangs in
flush_icache_all(). From "on_each_cpu(ipi_remote_fence_i, NULL, 1);" to
on_each_cpu_cond_mask() "smp_call_function_many_cond(mask, func, info,
scf_flags, cond_func);" which never returns from "csd_lock_wait(csd)"
right before the end of the function.
Any ideas? Disabling CONFIG_HID_BPF, that does the early ftrace code
patching fixes the boot hang, but it does seem related to IPI...
On Tue, Jan 30, 2024 at 8:18 PM Björn Töpel <bjorn@kernelorg> wrote:
>
> Björn Töpel <[email protected]> writes:
>
> > Anup Patel <[email protected]> writes:
> >
> >> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <[email protected]> wrote:
> >>>
> >>> Björn Töpel <[email protected]> writes:
> >>>
> >>> > Anup Patel <[email protected]> writes:
> >>> >
> >>> >> The RISC-V AIA specification is ratified as-per the RISC-V international
> >>> >> process. The latest ratified AIA specifcation can be found at:
> >>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> >>> >>
> >>> >> At a high-level, the AIA specification adds three things:
> >>> >> 1) AIA CSRs
> >>> >> - Improved local interrupt support
> >>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
> >>> >> - Per-HART MSI controller
> >>> >> - Support MSI virtualization
> >>> >> - Support IPI along with virtualization
> >>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
> >>> >> - Wired interrupt controller
> >>> >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
> >>> >> - In Direct-mode, injects external interrupts directly into HARTs
> >>> >>
> >>> >> For an overview of the AIA specification, refer the AIA virtualization
> >>> >> talk at KVM Forum 2022:
> >>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> >>> >> https://www.youtube.com/watch?v=r071dL8Z0yo
> >>> >>
> >>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
> >>> >>
> >>> >> These patches can also be found in the riscv_aia_v12 branch at:
> >>> >> https://github.com/avpatel/linux.git
> >>> >>
> >>> >> Changes since v11:
> >>> >> - Rebased on Linux-6.8-rc1
> >>> >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
> >>> >> MSI handling to per device MSI domains" series by Thomas.
> >>> >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
> >>> >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
> >>> >> https://lore.kernel.org/linux-arm-kernel/[email protected]/)
> >>> >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
> >>> >> - Updated IMSIC driver to support per-device MSI domains for PCI and
> >>> >> platform devices.
> >>> >
> >>> > Thanks for working on this, Anup! I'm still reviewing the patches.
> >>> >
> >>> > I'm hitting a boot hang in text patching, with this series applied on
> >>> > 6.8-rc2. IPI issues?
> >>>
> >>> Not text patching! One cpu spinning in smp_call_function_many_cond() and
> >>> the others are in cpu_relax(). Smells like IPI...
> >>
> >> I tried bootefi from U-Boot multiple times but can't reproduce the
> >> issue you are seeing.
> >
> > Thanks! I can reproduce without EFI, and simpler command-line:
> >
> > qemu-system-riscv64 \
> > -bios /path/to/fw_dynamic.bin \
> > -kernel /path/to/Image \
> > -append 'earlycon console=tty0 console=ttyS0' \
> > -machine virt,aia=aplic-imsic \
> > -no-reboot -nodefaults -nographic \
> > -smp 4 \
> > -object rng-random,filename=/dev/urandom,id=rng0 \
> > -device virtio-rng-device,rng=rng0 \
> > -m 4G -chardev stdio,id=char0 -serial chardev:char0
> >
> > I can reproduce with your upstream riscv_aia_v12 plus the config in the
> > gist [1], and all latest QEMU/OpenSBI:
> >
> > QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging")
> > OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found")
> > Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers")
> >
> > Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e.
> > panicking about missing root mount ;-))
>
> More context; The hang is during a late initcall, where an ftrace direct
> (register_ftrace_direct()) modification is done.
>
> Stop machine is used to call into __ftrace_modify_call(). Then into the
> arch specific patch_text_nosync(), where flush_icache_range() hangs in
> flush_icache_all(). From "on_each_cpu(ipi_remote_fence_i, NULL, 1);" to
> on_each_cpu_cond_mask() "smp_call_function_many_cond(mask, func, info,
> scf_flags, cond_func);" which never returns from "csd_lock_wait(csd)"
> right before the end of the function.
>
> Any ideas? Disabling CONFIG_HID_BPF, that does the early ftrace code
> patching fixes the boot hang, but it does seem related to IPI...
Thanks for the details, I will debug more at my end.
Regards,
Anup
On Tue, Jan 30, 2024 at 8:18 PM Björn Töpel <bjorn@kernelorg> wrote:
>
> Björn Töpel <[email protected]> writes:
>
> > Anup Patel <[email protected]> writes:
> >
> >> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <[email protected]> wrote:
> >>>
> >>> Björn Töpel <[email protected]> writes:
> >>>
> >>> > Anup Patel <[email protected]> writes:
> >>> >
> >>> >> The RISC-V AIA specification is ratified as-per the RISC-V international
> >>> >> process. The latest ratified AIA specifcation can be found at:
> >>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> >>> >>
> >>> >> At a high-level, the AIA specification adds three things:
> >>> >> 1) AIA CSRs
> >>> >> - Improved local interrupt support
> >>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
> >>> >> - Per-HART MSI controller
> >>> >> - Support MSI virtualization
> >>> >> - Support IPI along with virtualization
> >>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
> >>> >> - Wired interrupt controller
> >>> >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
> >>> >> - In Direct-mode, injects external interrupts directly into HARTs
> >>> >>
> >>> >> For an overview of the AIA specification, refer the AIA virtualization
> >>> >> talk at KVM Forum 2022:
> >>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> >>> >> https://www.youtube.com/watch?v=r071dL8Z0yo
> >>> >>
> >>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
> >>> >>
> >>> >> These patches can also be found in the riscv_aia_v12 branch at:
> >>> >> https://github.com/avpatel/linux.git
> >>> >>
> >>> >> Changes since v11:
> >>> >> - Rebased on Linux-6.8-rc1
> >>> >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
> >>> >> MSI handling to per device MSI domains" series by Thomas.
> >>> >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
> >>> >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
> >>> >> https://lore.kernel.org/linux-arm-kernel/[email protected]/)
> >>> >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
> >>> >> - Updated IMSIC driver to support per-device MSI domains for PCI and
> >>> >> platform devices.
> >>> >
> >>> > Thanks for working on this, Anup! I'm still reviewing the patches.
> >>> >
> >>> > I'm hitting a boot hang in text patching, with this series applied on
> >>> > 6.8-rc2. IPI issues?
> >>>
> >>> Not text patching! One cpu spinning in smp_call_function_many_cond() and
> >>> the others are in cpu_relax(). Smells like IPI...
> >>
> >> I tried bootefi from U-Boot multiple times but can't reproduce the
> >> issue you are seeing.
> >
> > Thanks! I can reproduce without EFI, and simpler command-line:
> >
> > qemu-system-riscv64 \
> > -bios /path/to/fw_dynamic.bin \
> > -kernel /path/to/Image \
> > -append 'earlycon console=tty0 console=ttyS0' \
> > -machine virt,aia=aplic-imsic \
> > -no-reboot -nodefaults -nographic \
> > -smp 4 \
> > -object rng-random,filename=/dev/urandom,id=rng0 \
> > -device virtio-rng-device,rng=rng0 \
> > -m 4G -chardev stdio,id=char0 -serial chardev:char0
> >
> > I can reproduce with your upstream riscv_aia_v12 plus the config in the
> > gist [1], and all latest QEMU/OpenSBI:
> >
> > QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging")
> > OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found")
> > Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers")
> >
> > Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e.
> > panicking about missing root mount ;-))
>
> More context; The hang is during a late initcall, where an ftrace direct
> (register_ftrace_direct()) modification is done.
>
> Stop machine is used to call into __ftrace_modify_call(). Then into the
> arch specific patch_text_nosync(), where flush_icache_range() hangs in
> flush_icache_all(). From "on_each_cpu(ipi_remote_fence_i, NULL, 1);" to
> on_each_cpu_cond_mask() "smp_call_function_many_cond(mask, func, info,
> scf_flags, cond_func);" which never returns from "csd_lock_wait(csd)"
> right before the end of the function.
>
> Any ideas? Disabling CONFIG_HID_BPF, that does the early ftrace code
> patching fixes the boot hang, but it does seem related to IPI...
>
Looks like flush_icache_all() does not use the IPIs (on_each_cpu()
and friends) correctly.
On other hand, the flush_icache_mm() does the right thing by
doing local flush on the current CPU and IPI based flush on other
CPUs.
Can you try the following patch ?
diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index 55a34f2020a8..a3dfbe4de832 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -19,12 +19,18 @@ static void ipi_remote_fence_i(void *info)
void flush_icache_all(void)
{
+ cpumask_t others;
+
local_flush_icache_all();
+ cpumask_andnot(&others, cpu_online_mask, cpumask_of(smp_processor_id()));
+ if (cpumask_empty(&others))
+ return;
+
if (IS_ENABLED(CONFIG_RISCV_SBI) && !riscv_use_ipi_for_rfence())
- sbi_remote_fence_i(NULL);
+ sbi_remote_fence_i(&others);
else
- on_each_cpu(ipi_remote_fence_i, NULL, 1);
+ on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1);
}
EXPORT_SYMBOL(flush_icache_all);
Regards,
Anup
Anup Patel <[email protected]> writes:
> On Tue, Jan 30, 2024 at 8:18 PM Björn Töpel <[email protected]> wrote:
>>
>> Björn Töpel <[email protected]> writes:
>>
>> > Anup Patel <[email protected]> writes:
>> >
>> >> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <[email protected]> wrote:
>> >>>
>> >>> Björn Töpel <[email protected]> writes:
>> >>>
>> >>> > Anup Patel <[email protected]> writes:
>> >>> >
>> >>> >> The RISC-V AIA specification is ratified as-per the RISC-V international
>> >>> >> process. The latest ratified AIA specifcation can be found at:
>> >>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>> >>> >>
>> >>> >> At a high-level, the AIA specification adds three things:
>> >>> >> 1) AIA CSRs
>> >>> >> - Improved local interrupt support
>> >>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
>> >>> >> - Per-HART MSI controller
>> >>> >> - Support MSI virtualization
>> >>> >> - Support IPI along with virtualization
>> >>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
>> >>> >> - Wired interrupt controller
>> >>> >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
>> >>> >> - In Direct-mode, injects external interrupts directly into HARTs
>> >>> >>
>> >>> >> For an overview of the AIA specification, refer the AIA virtualization
>> >>> >> talk at KVM Forum 2022:
>> >>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
>> >>> >> https://www.youtube.com/watch?v=r071dL8Z0yo
>> >>> >>
>> >>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
>> >>> >>
>> >>> >> These patches can also be found in the riscv_aia_v12 branch at:
>> >>> >> https://github.com/avpatel/linux.git
>> >>> >>
>> >>> >> Changes since v11:
>> >>> >> - Rebased on Linux-6.8-rc1
>> >>> >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
>> >>> >> MSI handling to per device MSI domains" series by Thomas.
>> >>> >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
>> >>> >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
>> >>> >> https://lore.kernel.org/linux-arm-kernel/[email protected]/)
>> >>> >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
>> >>> >> - Updated IMSIC driver to support per-device MSI domains for PCI and
>> >>> >> platform devices.
>> >>> >
>> >>> > Thanks for working on this, Anup! I'm still reviewing the patches.
>> >>> >
>> >>> > I'm hitting a boot hang in text patching, with this series applied on
>> >>> > 6.8-rc2. IPI issues?
>> >>>
>> >>> Not text patching! One cpu spinning in smp_call_function_many_cond() and
>> >>> the others are in cpu_relax(). Smells like IPI...
>> >>
>> >> I tried bootefi from U-Boot multiple times but can't reproduce the
>> >> issue you are seeing.
>> >
>> > Thanks! I can reproduce without EFI, and simpler command-line:
>> >
>> > qemu-system-riscv64 \
>> > -bios /path/to/fw_dynamic.bin \
>> > -kernel /path/to/Image \
>> > -append 'earlycon console=tty0 console=ttyS0' \
>> > -machine virt,aia=aplic-imsic \
>> > -no-reboot -nodefaults -nographic \
>> > -smp 4 \
>> > -object rng-random,filename=/dev/urandom,id=rng0 \
>> > -device virtio-rng-device,rng=rng0 \
>> > -m 4G -chardev stdio,id=char0 -serial chardev:char0
>> >
>> > I can reproduce with your upstream riscv_aia_v12 plus the config in the
>> > gist [1], and all latest QEMU/OpenSBI:
>> >
>> > QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging")
>> > OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found")
>> > Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers")
>> >
>> > Removing ",aia=aplic-imsic" from the CLI above completes the boot (ie.
>> > panicking about missing root mount ;-))
>>
>> More context; The hang is during a late initcall, where an ftrace direct
>> (register_ftrace_direct()) modification is done.
>>
>> Stop machine is used to call into __ftrace_modify_call(). Then into the
>> arch specific patch_text_nosync(), where flush_icache_range() hangs in
>> flush_icache_all(). From "on_each_cpu(ipi_remote_fence_i, NULL, 1);" to
>> on_each_cpu_cond_mask() "smp_call_function_many_cond(mask, func, info,
>> scf_flags, cond_func);" which never returns from "csd_lock_wait(csd)"
>> right before the end of the function.
>>
>> Any ideas? Disabling CONFIG_HID_BPF, that does the early ftrace code
>> patching fixes the boot hang, but it does seem related to IPI...
>>
> Looks like flush_icache_all() does not use the IPIs (on_each_cpu()
> and friends) correctly.
>
> On other hand, the flush_icache_mm() does the right thing by
> doing local flush on the current CPU and IPI based flush on other
> CPUs.
>
> Can you try the following patch ?
>
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index 55a34f2020a8..a3dfbe4de832 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -19,12 +19,18 @@ static void ipi_remote_fence_i(void *info)
>
> void flush_icache_all(void)
> {
> + cpumask_t others;
> +
> local_flush_icache_all();
>
> + cpumask_andnot(&others, cpu_online_mask, cpumask_of(smp_processor_id()));
> + if (cpumask_empty(&others))
> + return;
> +
> if (IS_ENABLED(CONFIG_RISCV_SBI) && !riscv_use_ipi_for_rfence())
> - sbi_remote_fence_i(NULL);
> + sbi_remote_fence_i(&others);
> else
> - on_each_cpu(ipi_remote_fence_i, NULL, 1);
> + on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1);
> }
> EXPORT_SYMBOL(flush_icache_all);
Unfortunately, I see the same hang. LMK if you'd like me to try anything
else.
Björn
Hi Anup,
On Sun, Jan 28, 2024 at 12:24 AM Anup Patel <[email protected]> wrote:
>
> The RISC-V advanced interrupt architecture (AIA) specification defines
> advanced platform-level interrupt controller (APLIC) which has two modes
> of operation: 1) Direct mode and 2) MSI mode.
> (For more details, refer https://github.com/riscv/riscv-aia)
>
> In APLIC direct-mode, wired interrupts are forwared to CPUs (or HARTs)
> as a local external interrupt.
>
> We add a platform irqchip driver for the RISC-V APLIC direct-mode to
> support RISC-V platforms having only wired interrupts.
>
> Signed-off-by: Anup Patel <[email protected]>
> ---
> drivers/irqchip/Kconfig | 5 +
> drivers/irqchip/Makefile | 1 +
> drivers/irqchip/irq-riscv-aplic-direct.c | 343 +++++++++++++++++++++++
> drivers/irqchip/irq-riscv-aplic-main.c | 232 +++++++++++++++
> drivers/irqchip/irq-riscv-aplic-main.h | 45 +++
> include/linux/irqchip/riscv-aplic.h | 119 ++++++++
> 6 files changed, 745 insertions(+)
> create mode 100644 drivers/irqchip/irq-riscv-aplic-direct.c
> create mode 100644 drivers/irqchip/irq-riscv-aplic-main.c
> create mode 100644 drivers/irqchip/irq-riscv-aplic-main.h
> create mode 100644 include/linux/irqchip/riscv-aplic.h
>
> diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
> index 2fc0cb32341a..dbc8811d3764 100644
> --- a/drivers/irqchip/Kconfig
> +++ b/drivers/irqchip/Kconfig
> @@ -546,6 +546,11 @@ config SIFIVE_PLIC
> select IRQ_DOMAIN_HIERARCHY
> select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP
>
> +config RISCV_APLIC
> + bool
> + depends on RISCV
> + select IRQ_DOMAIN_HIERARCHY
> +
> config RISCV_IMSIC
> bool
> depends on RISCV
> diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
> index abca445a3229..7f8289790ed8 100644
> --- a/drivers/irqchip/Makefile
> +++ b/drivers/irqchip/Makefile
> @@ -95,6 +95,7 @@ obj-$(CONFIG_QCOM_MPM) += irq-qcom-mpm.o
> obj-$(CONFIG_CSKY_MPINTC) += irq-csky-mpintc.o
> obj-$(CONFIG_CSKY_APB_INTC) += irq-csky-apb-intc.o
> obj-$(CONFIG_RISCV_INTC) += irq-riscv-intc.o
> +obj-$(CONFIG_RISCV_APLIC) += irq-riscv-aplic-main.o irq-riscv-aplic-direct.o
> obj-$(CONFIG_RISCV_IMSIC) += irq-riscv-imsic-state.o irq-riscv-imsic-early.o irq-riscv-imsic-platform.o
> obj-$(CONFIG_SIFIVE_PLIC) += irq-sifive-plic.o
> obj-$(CONFIG_IMX_IRQSTEER) += irq-imx-irqsteer.o
> diff --git a/drivers/irqchip/irq-riscv-aplic-direct.c b/drivers/irqchip/irq-riscv-aplic-direct.c
> new file mode 100644
> index 000000000000..9ed2666bfb5e
> --- /dev/null
> +++ b/drivers/irqchip/irq-riscv-aplic-direct.c
> @@ -0,0 +1,343 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2021 Western Digital Corporation or its affiliates.
> + * Copyright (C) 2022 Ventana Micro Systems Inc.
> + */
> +
> +#include <linux/bitops.h>
> +#include <linux/cpu.h>
> +#include <linux/interrupt.h>
> +#include <linux/irqchip.h>
> +#include <linux/irqchip/chained_irq.h>
> +#include <linux/irqchip/riscv-aplic.h>
> +#include <linux/module.h>
> +#include <linux/of_address.h>
> +#include <linux/printk.h>
> +#include <linux/smp.h>
> +
> +#include "irq-riscv-aplic-main.h"
> +
> +#define APLIC_DISABLE_IDELIVERY 0
> +#define APLIC_ENABLE_IDELIVERY 1
> +#define APLIC_DISABLE_ITHRESHOLD 1
> +#define APLIC_ENABLE_ITHRESHOLD 0
> +
> +struct aplic_direct {
> + struct aplic_priv priv;
> + struct irq_domain *irqdomain;
> + struct cpumask lmask;
> +};
> +
> +struct aplic_idc {
> + unsigned int hart_index;
> + void __iomem *regs;
> + struct aplic_direct *direct;
> +};
> +
> +static unsigned int aplic_direct_parent_irq;
> +static DEFINE_PER_CPU(struct aplic_idc, aplic_idcs);
> +
> +static void aplic_direct_irq_eoi(struct irq_data *d)
> +{
> + /*
> + * The fasteoi_handler requires irq_eoi() callback hence
> + * provide a dummy handler.
> + */
> +}
> +
> +#ifdef CONFIG_SMP
> +static int aplic_direct_set_affinity(struct irq_data *d,
> + const struct cpumask *mask_val, bool force)
> +{
> + struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
> + struct aplic_direct *direct =
> + container_of(priv, struct aplic_direct, priv);
> + struct aplic_idc *idc;
> + unsigned int cpu, val;
> + struct cpumask amask;
> + void __iomem *target;
> +
> + cpumask_and(&amask, &direct->lmask, mask_val);
> +
> + if (force)
> + cpu = cpumask_first(&amask);
> + else
> + cpu = cpumask_any_and(&amask, cpu_online_mask);
> +
> + if (cpu >= nr_cpu_ids)
> + return -EINVAL;
> +
> + idc = per_cpu_ptr(&aplic_idcs, cpu);
> + target = priv->regs + APLIC_TARGET_BASE;
> + target += (d->hwirq - 1) * sizeof(u32);
> + val = idc->hart_index & APLIC_TARGET_HART_IDX_MASK;
> + val <<= APLIC_TARGET_HART_IDX_SHIFT;
> + val |= APLIC_DEFAULT_PRIORITY;
> + writel(val, target);
> +
> + irq_data_update_effective_affinity(d, cpumask_of(cpu));
> +
> + return IRQ_SET_MASK_OK_DONE;
> +}
> +#endif
> +
> +static struct irq_chip aplic_direct_chip = {
> + .name = "APLIC-DIRECT",
> + .irq_mask = aplic_irq_mask,
> + .irq_unmask = aplic_irq_unmask,
> + .irq_set_type = aplic_irq_set_type,
> + .irq_eoi = aplic_direct_irq_eoi,
> +#ifdef CONFIG_SMP
> + .irq_set_affinity = aplic_direct_set_affinity,
> +#endif
> + .flags = IRQCHIP_SET_TYPE_MASKED |
> + IRQCHIP_SKIP_SET_WAKE |
> + IRQCHIP_MASK_ON_SUSPEND,
> +};
> +
> +static int aplic_direct_irqdomain_translate(struct irq_domain *d,
> + struct irq_fwspec *fwspec,
> + unsigned long *hwirq,
> + unsigned int *type)
> +{
> + struct aplic_priv *priv = d->host_data;
> +
> + return aplic_irqdomain_translate(fwspec, priv->gsi_base,
> + hwirq, type);
> +}
> +
> +static int aplic_direct_irqdomain_alloc(struct irq_domain *domain,
> + unsigned int virq, unsigned int nr_irqs,
> + void *arg)
> +{
> + int i, ret;
> + unsigned int type;
> + irq_hw_number_t hwirq;
> + struct irq_fwspec *fwspec = arg;
> + struct aplic_priv *priv = domain->host_data;
> + struct aplic_direct *direct =
> + container_of(priv, struct aplic_direct, priv);
> +
> + ret = aplic_irqdomain_translate(fwspec, priv->gsi_base,
> + &hwirq, &type);
> + if (ret)
> + return ret;
> +
> + for (i = 0; i < nr_irqs; i++) {
> + irq_domain_set_info(domain, virq + i, hwirq + i,
> + &aplic_direct_chip, priv,
> + handle_fasteoi_irq, NULL, NULL);
> + irq_set_affinity(virq + i, &direct->lmask);
> + /* See the reason described in aplic_msi_irqdomain_alloc() */
> + irq_set_status_flags(virq + i, IRQ_DISABLE_UNLAZY);
> + }
> +
> + return 0;
> +}
> +
> +static const struct irq_domain_ops aplic_direct_irqdomain_ops = {
> + .translate = aplic_direct_irqdomain_translate,
> + .alloc = aplic_direct_irqdomain_alloc,
> + .free = irq_domain_free_irqs_top,
> +};
> +
> +/*
> + * To handle an APLIC direct interrupts, we just read the CLAIMI register
> + * which will return highest priority pending interrupt and clear the
> + * pending bit of the interrupt. This process is repeated until CLAIMI
> + * register return zero value.
> + */
> +static void aplic_direct_handle_irq(struct irq_desc *desc)
> +{
> + struct aplic_idc *idc = this_cpu_ptr(&aplic_idcs);
> + struct irq_chip *chip = irq_desc_get_chip(desc);
> + struct irq_domain *irqdomain = idc->direct->irqdomain;
> + irq_hw_number_t hw_irq;
> + int irq;
> +
> + chained_irq_enter(chip, desc);
> +
> + while ((hw_irq = readl(idc->regs + APLIC_IDC_CLAIMI))) {
> + hw_irq = hw_irq >> APLIC_IDC_TOPI_ID_SHIFT;
> + irq = irq_find_mapping(irqdomain, hw_irq);
> +
> + if (unlikely(irq <= 0))
> + dev_warn_ratelimited(idc->direct->priv.dev,
> + "hw_irq %lu mapping not found\n",
> + hw_irq);
> + else
> + generic_handle_irq(irq);
> + }
> +
> + chained_irq_exit(chip, desc);
> +}
> +
> +static void aplic_idc_set_delivery(struct aplic_idc *idc, bool en)
> +{
> + u32 de = (en) ? APLIC_ENABLE_IDELIVERY : APLIC_DISABLE_IDELIVERY;
> + u32 th = (en) ? APLIC_ENABLE_ITHRESHOLD : APLIC_DISABLE_ITHRESHOLD;
> +
> + /* Priority must be less than threshold for interrupt triggering */
> + writel(th, idc->regs + APLIC_IDC_ITHRESHOLD);
> +
> + /* Delivery must be set to 1 for interrupt triggering */
> + writel(de, idc->regs + APLIC_IDC_IDELIVERY);
> +}
> +
> +static int aplic_direct_dying_cpu(unsigned int cpu)
> +{
> + if (aplic_direct_parent_irq)
> + disable_percpu_irq(aplic_direct_parent_irq);
> +
> + return 0;
> +}
> +
> +static int aplic_direct_starting_cpu(unsigned int cpu)
> +{
> + if (aplic_direct_parent_irq)
> + enable_percpu_irq(aplic_direct_parent_irq,
> + irq_get_trigger_type(aplic_direct_parent_irq));
> +
> + return 0;
> +}
> +
> +static int aplic_direct_parse_parent_hwirq(struct device *dev,
> + u32 index, u32 *parent_hwirq,
> + unsigned long *parent_hartid)
> +{
> + struct of_phandle_args parent;
> + int rc;
> +
> + /*
> + * Currently, only OF fwnode is supported so extend this
> + * function for ACPI support.
> + */
> + if (!is_of_node(dev->fwnode))
> + return -EINVAL;
> +
> + rc = of_irq_parse_one(to_of_node(dev->fwnode), index, &parent);
> + if (rc)
> + return rc;
> +
> + rc = riscv_of_parent_hartid(parent.np, parent_hartid);
> + if (rc)
> + return rc;
> +
> + *parent_hwirq = parent.args[0];
> + return 0;
> +}
> +
> +int aplic_direct_setup(struct device *dev, void __iomem *regs)
> +{
> + int i, j, rc, cpu, setup_count = 0;
> + struct aplic_direct *direct;
> + struct aplic_priv *priv;
> + struct irq_domain *domain;
> + unsigned long hartid;
> + struct aplic_idc *idc;
> + u32 val, hwirq;
> +
> + direct = kzalloc(sizeof(*direct), GFP_KERNEL);
> + if (!direct)
> + return -ENOMEM;
> + priv = &direct->priv;
> +
> + rc = aplic_setup_priv(priv, dev, regs);
> + if (rc) {
> + dev_err(dev, "failed to create APLIC context\n");
> + kfree(direct);
> + return rc;
> + }
> +
> + /* Setup per-CPU IDC and target CPU mask */
> + for (i = 0; i < priv->nr_idcs; i++) {
> + rc = aplic_direct_parse_parent_hwirq(dev, i, &hwirq, &hartid);
> + if (rc) {
> + dev_warn(dev, "parent irq for IDC%d not found\n", i);
> + continue;
> + }
> +
> + /*
> + * Skip interrupts other than external interrupts for
> + * current privilege level.
> + */
> + if (hwirq != RV_IRQ_EXT)
> + continue;
> +
> + cpu = riscv_hartid_to_cpuid(hartid);
> + if (cpu < 0) {
> + dev_warn(dev, "invalid cpuid for IDC%d\n", i);
> + continue;
> + }
> +
> + cpumask_set_cpu(cpu, &direct->lmask);
> +
> + idc = per_cpu_ptr(&aplic_idcs, cpu);
> + idc->hart_index = i;
> + idc->regs = priv->regs + APLIC_IDC_BASE + i * APLIC_IDC_SIZE;
> + idc->direct = direct;
> +
> + aplic_idc_set_delivery(idc, true);
> +
> + /*
> + * Boot cpu might not have APLIC hart_index = 0 so check
> + * and update target registers of all interrupts.
> + */
IIUC, the use of smp_processor_id() has to be protected by turning off
preemption. So maybe please consider adding:
+ preempt_disable();
> + if (cpu == smp_processor_id() && idc->hart_index) {
> + val = idc->hart_index & APLIC_TARGET_HART_IDX_MASK;
> + val <<= APLIC_TARGET_HART_IDX_SHIFT;
> + val |= APLIC_DEFAULT_PRIORITY;
> + for (j = 1; j <= priv->nr_irqs; j++)
> + writel(val, priv->regs + APLIC_TARGET_BASE +
> + (j - 1) * sizeof(u32));
> + }
, and here:
+ preempt_enable();
Or use get_cpu()/put_cpu() variant to guard the use of processor id.
> +
> + setup_count++;
> + }
> +
> + /* Find parent domain and register chained handler */
> + domain = irq_find_matching_fwnode(riscv_get_intc_hwnode(),
> + DOMAIN_BUS_ANY);
> + if (!aplic_direct_parent_irq && domain) {
> + aplic_direct_parent_irq = irq_create_mapping(domain, RV_IRQ_EXT);
> + if (aplic_direct_parent_irq) {
> + irq_set_chained_handler(aplic_direct_parent_irq,
> + aplic_direct_handle_irq);
> +
> + /*
> + * Setup CPUHP notifier to enable parent
> + * interrupt on all CPUs
> + */
> + cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> + "irqchip/riscv/aplic:starting",
> + aplic_direct_starting_cpu,
> + aplic_direct_dying_cpu);
> + }
> + }
> +
> + /* Fail if we were not able to setup IDC for any CPU */
> + if (!setup_count) {
> + kfree(direct);
> + return -ENODEV;
> + }
> +
> + /* Setup global config and interrupt delivery */
> + aplic_init_hw_global(priv, false);
> +
> + /* Create irq domain instance for the APLIC */
> + direct->irqdomain = irq_domain_create_linear(dev->fwnode,
> + priv->nr_irqs + 1,
> + &aplic_direct_irqdomain_ops,
> + priv);
> + if (!direct->irqdomain) {
> + dev_err(dev, "failed to create direct irq domain\n");
> + kfree(direct);
> + return -ENOMEM;
> + }
> +
> + /* Advertise the interrupt controller */
> + dev_info(dev, "%d interrupts directly connected to %d CPUs\n",
> + priv->nr_irqs, priv->nr_idcs);
> +
> + return 0;
> +}
> diff --git a/drivers/irqchip/irq-riscv-aplic-main.c b/drivers/irqchip/irq-riscv-aplic-main.c
> new file mode 100644
> index 000000000000..87450708a733
> --- /dev/null
> +++ b/drivers/irqchip/irq-riscv-aplic-main.c
> @@ -0,0 +1,232 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2021 Western Digital Corporation or its affiliates.
> + * Copyright (C) 2022 Ventana Micro Systems Inc.
> + */
> +
> +#include <linux/of.h>
> +#include <linux/of_irq.h>
> +#include <linux/printk.h>
> +#include <linux/module.h>
> +#include <linux/platform_device.h>
> +#include <linux/irqchip/riscv-aplic.h>
> +
> +#include "irq-riscv-aplic-main.h"
> +
> +void aplic_irq_unmask(struct irq_data *d)
> +{
> + struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
> +
> + writel(d->hwirq, priv->regs + APLIC_SETIENUM);
> +}
> +
> +void aplic_irq_mask(struct irq_data *d)
> +{
> + struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
> +
> + writel(d->hwirq, priv->regs + APLIC_CLRIENUM);
> +}
> +
> +int aplic_irq_set_type(struct irq_data *d, unsigned int type)
> +{
> + u32 val = 0;
> + void __iomem *sourcecfg;
> + struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
> +
> + switch (type) {
> + case IRQ_TYPE_NONE:
> + val = APLIC_SOURCECFG_SM_INACTIVE;
> + break;
> + case IRQ_TYPE_LEVEL_LOW:
> + val = APLIC_SOURCECFG_SM_LEVEL_LOW;
> + break;
> + case IRQ_TYPE_LEVEL_HIGH:
> + val = APLIC_SOURCECFG_SM_LEVEL_HIGH;
> + break;
> + case IRQ_TYPE_EDGE_FALLING:
> + val = APLIC_SOURCECFG_SM_EDGE_FALL;
> + break;
> + case IRQ_TYPE_EDGE_RISING:
> + val = APLIC_SOURCECFG_SM_EDGE_RISE;
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + sourcecfg = priv->regs + APLIC_SOURCECFG_BASE;
> + sourcecfg += (d->hwirq - 1) * sizeof(u32);
> + writel(val, sourcecfg);
> +
> + return 0;
> +}
> +
> +int aplic_irqdomain_translate(struct irq_fwspec *fwspec, u32 gsi_base,
> + unsigned long *hwirq, unsigned int *type)
> +{
> + if (WARN_ON(fwspec->param_count < 2))
> + return -EINVAL;
> + if (WARN_ON(!fwspec->param[0]))
> + return -EINVAL;
> +
> + /* For DT, gsi_base is always zero. */
> + *hwirq = fwspec->param[0] - gsi_base;
> + *type = fwspec->param[1] & IRQ_TYPE_SENSE_MASK;
> +
> + WARN_ON(*type == IRQ_TYPE_NONE);
> +
> + return 0;
> +}
> +
> +void aplic_init_hw_global(struct aplic_priv *priv, bool msi_mode)
> +{
> + u32 val;
> +#ifdef CONFIG_RISCV_M_MODE
> + u32 valH;
> +
> + if (msi_mode) {
> + val = priv->msicfg.base_ppn;
> + valH = ((u64)priv->msicfg.base_ppn >> 32) &
> + APLIC_xMSICFGADDRH_BAPPN_MASK;
> + valH |= (priv->msicfg.lhxw & APLIC_xMSICFGADDRH_LHXW_MASK)
> + << APLIC_xMSICFGADDRH_LHXW_SHIFT;
> + valH |= (priv->msicfg.hhxw & APLIC_xMSICFGADDRH_HHXW_MASK)
> + << APLIC_xMSICFGADDRH_HHXW_SHIFT;
> + valH |= (priv->msicfg.lhxs & APLIC_xMSICFGADDRH_LHXS_MASK)
> + << APLIC_xMSICFGADDRH_LHXS_SHIFT;
> + valH |= (priv->msicfg.hhxs & APLIC_xMSICFGADDRH_HHXS_MASK)
> + << APLIC_xMSICFGADDRH_HHXS_SHIFT;
> + writel(val, priv->regs + APLIC_xMSICFGADDR);
> + writel(valH, priv->regs + APLIC_xMSICFGADDRH);
> + }
> +#endif
> +
> + /* Setup APLIC domaincfg register */
> + val = readl(priv->regs + APLIC_DOMAINCFG);
> + val |= APLIC_DOMAINCFG_IE;
> + if (msi_mode)
> + val |= APLIC_DOMAINCFG_DM;
> + writel(val, priv->regs + APLIC_DOMAINCFG);
> + if (readl(priv->regs + APLIC_DOMAINCFG) != val)
> + dev_warn(priv->dev, "unable to write 0x%x in domaincfg\n",
> + val);
> +}
> +
> +static void aplic_init_hw_irqs(struct aplic_priv *priv)
> +{
> + int i;
> +
> + /* Disable all interrupts */
> + for (i = 0; i <= priv->nr_irqs; i += 32)
> + writel(-1U, priv->regs + APLIC_CLRIE_BASE +
> + (i / 32) * sizeof(u32));
> +
> + /* Set interrupt type and default priority for all interrupts */
> + for (i = 1; i <= priv->nr_irqs; i++) {
> + writel(0, priv->regs + APLIC_SOURCECFG_BASE +
> + (i - 1) * sizeof(u32));
> + writel(APLIC_DEFAULT_PRIORITY,
> + priv->regs + APLIC_TARGET_BASE +
> + (i - 1) * sizeof(u32));
> + }
> +
> + /* Clear APLIC domaincfg */
> + writel(0, priv->regs + APLIC_DOMAINCFG);
> +}
> +
> +int aplic_setup_priv(struct aplic_priv *priv, struct device *dev,
> + void __iomem *regs)
> +{
> + struct of_phandle_args parent;
> + int rc;
> +
> + /*
> + * Currently, only OF fwnode is supported so extend this
> + * function for ACPI support.
> + */
> + if (!is_of_node(dev->fwnode))
> + return -EINVAL;
> +
> + /* Save device pointer and register base */
> + priv->dev = dev;
> + priv->regs = regs;
> +
> + /* Find out number of interrupt sources */
> + rc = of_property_read_u32(to_of_node(dev->fwnode),
> + "riscv,num-sources",
> + &priv->nr_irqs);
> + if (rc) {
> + dev_err(dev, "failed to get number of interrupt sources\n");
> + return rc;
> + }
> +
> + /*
> + * Find out number of IDCs based on parent interrupts
> + *
> + * If "msi-parent" property is present then we ignore the
> + * APLIC IDCs which forces the APLIC driver to use MSI mode.
> + */
> + if (!of_property_present(to_of_node(dev->fwnode), "msi-parent")) {
> + while (!of_irq_parse_one(to_of_node(dev->fwnode),
> + priv->nr_idcs, &parent))
> + priv->nr_idcs++;
> + }
> +
> + /* Setup initial state APLIC interrupts */
> + aplic_init_hw_irqs(priv);
> +
> + return 0;
> +}
> +
> +static int aplic_probe(struct platform_device *pdev)
> +{
> + struct device *dev = &pdev->dev;
> + bool msi_mode = false;
> + struct resource *res;
> + void __iomem *regs;
> + int rc;
> +
> + /* Map the MMIO registers */
> + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> + if (!res) {
> + dev_err(dev, "failed to get MMIO resource\n");
> + return -EINVAL;
> + }
> + regs = devm_ioremap(&pdev->dev, res->start, resource_size(res));
> + if (!regs) {
> + dev_err(dev, "failed map MMIO registers\n");
> + return -ENOMEM;
> + }
> +
> + /*
> + * If msi-parent property is present then setup APLIC MSI
> + * mode otherwise setup APLIC direct mode.
> + */
> + if (is_of_node(dev->fwnode))
> + msi_mode = of_property_present(to_of_node(dev->fwnode),
> + "msi-parent");
> + if (msi_mode)
> + rc = -ENODEV;
> + else
> + rc = aplic_direct_setup(dev, regs);
> + if (rc) {
> + dev_err(dev, "failed setup APLIC in %s mode\n",
> + msi_mode ? "MSI" : "direct");
> + return rc;
> + }
> +
> + return 0;
> +}
> +
> +static const struct of_device_id aplic_match[] = {
> + { .compatible = "riscv,aplic" },
> + {}
> +};
> +
> +static struct platform_driver aplic_driver = {
> + .driver = {
> + .name = "riscv-aplic",
> + .of_match_table = aplic_match,
> + },
> + .probe = aplic_probe,
> +};
> +builtin_platform_driver(aplic_driver);
> diff --git a/drivers/irqchip/irq-riscv-aplic-main.h b/drivers/irqchip/irq-riscv-aplic-main.h
> new file mode 100644
> index 000000000000..474a04229334
> --- /dev/null
> +++ b/drivers/irqchip/irq-riscv-aplic-main.h
> @@ -0,0 +1,45 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2021 Western Digital Corporation or its affiliates.
> + * Copyright (C) 2022 Ventana Micro Systems Inc.
> + */
> +
> +#ifndef _IRQ_RISCV_APLIC_MAIN_H
> +#define _IRQ_RISCV_APLIC_MAIN_H
> +
> +#include <linux/device.h>
> +#include <linux/io.h>
> +#include <linux/irq.h>
> +#include <linux/irqdomain.h>
> +#include <linux/fwnode.h>
> +
> +#define APLIC_DEFAULT_PRIORITY 1
> +
> +struct aplic_msicfg {
> + phys_addr_t base_ppn;
> + u32 hhxs;
> + u32 hhxw;
> + u32 lhxs;
> + u32 lhxw;
> +};
> +
> +struct aplic_priv {
> + struct device *dev;
> + u32 gsi_base;
> + u32 nr_irqs;
> + u32 nr_idcs;
> + void __iomem *regs;
> + struct aplic_msicfg msicfg;
> +};
> +
> +void aplic_irq_unmask(struct irq_data *d);
> +void aplic_irq_mask(struct irq_data *d);
> +int aplic_irq_set_type(struct irq_data *d, unsigned int type);
> +int aplic_irqdomain_translate(struct irq_fwspec *fwspec, u32 gsi_base,
> + unsigned long *hwirq, unsigned int *type);
> +void aplic_init_hw_global(struct aplic_priv *priv, bool msi_mode);
> +int aplic_setup_priv(struct aplic_priv *priv, struct device *dev,
> + void __iomem *regs);
> +int aplic_direct_setup(struct device *dev, void __iomem *regs);
> +
> +#endif
> diff --git a/include/linux/irqchip/riscv-aplic.h b/include/linux/irqchip/riscv-aplic.h
> new file mode 100644
> index 000000000000..97e198ea0109
> --- /dev/null
> +++ b/include/linux/irqchip/riscv-aplic.h
> @@ -0,0 +1,119 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2021 Western Digital Corporation or its affiliates.
> + * Copyright (C) 2022 Ventana Micro Systems Inc.
> + */
> +#ifndef __LINUX_IRQCHIP_RISCV_APLIC_H
> +#define __LINUX_IRQCHIP_RISCV_APLIC_H
> +
> +#include <linux/bitops.h>
> +
> +#define APLIC_MAX_IDC BIT(14)
> +#define APLIC_MAX_SOURCE 1024
> +
> +#define APLIC_DOMAINCFG 0x0000
> +#define APLIC_DOMAINCFG_RDONLY 0x80000000
> +#define APLIC_DOMAINCFG_IE BIT(8)
> +#define APLIC_DOMAINCFG_DM BIT(2)
> +#define APLIC_DOMAINCFG_BE BIT(0)
> +
> +#define APLIC_SOURCECFG_BASE 0x0004
> +#define APLIC_SOURCECFG_D BIT(10)
> +#define APLIC_SOURCECFG_CHILDIDX_MASK 0x000003ff
> +#define APLIC_SOURCECFG_SM_MASK 0x00000007
> +#define APLIC_SOURCECFG_SM_INACTIVE 0x0
> +#define APLIC_SOURCECFG_SM_DETACH 0x1
> +#define APLIC_SOURCECFG_SM_EDGE_RISE 0x4
> +#define APLIC_SOURCECFG_SM_EDGE_FALL 0x5
> +#define APLIC_SOURCECFG_SM_LEVEL_HIGH 0x6
> +#define APLIC_SOURCECFG_SM_LEVEL_LOW 0x7
> +
> +#define APLIC_MMSICFGADDR 0x1bc0
> +#define APLIC_MMSICFGADDRH 0x1bc4
> +#define APLIC_SMSICFGADDR 0x1bc8
> +#define APLIC_SMSICFGADDRH 0x1bcc
> +
> +#ifdef CONFIG_RISCV_M_MODE
> +#define APLIC_xMSICFGADDR APLIC_MMSICFGADDR
> +#define APLIC_xMSICFGADDRH APLIC_MMSICFGADDRH
> +#else
> +#define APLIC_xMSICFGADDR APLIC_SMSICFGADDR
> +#define APLIC_xMSICFGADDRH APLIC_SMSICFGADDRH
> +#endif
> +
> +#define APLIC_xMSICFGADDRH_L BIT(31)
> +#define APLIC_xMSICFGADDRH_HHXS_MASK 0x1f
> +#define APLIC_xMSICFGADDRH_HHXS_SHIFT 24
> +#define APLIC_xMSICFGADDRH_LHXS_MASK 0x7
> +#define APLIC_xMSICFGADDRH_LHXS_SHIFT 20
> +#define APLIC_xMSICFGADDRH_HHXW_MASK 0x7
> +#define APLIC_xMSICFGADDRH_HHXW_SHIFT 16
> +#define APLIC_xMSICFGADDRH_LHXW_MASK 0xf
> +#define APLIC_xMSICFGADDRH_LHXW_SHIFT 12
> +#define APLIC_xMSICFGADDRH_BAPPN_MASK 0xfff
> +
> +#define APLIC_xMSICFGADDR_PPN_SHIFT 12
> +
> +#define APLIC_xMSICFGADDR_PPN_HART(__lhxs) \
> + (BIT(__lhxs) - 1)
> +
> +#define APLIC_xMSICFGADDR_PPN_LHX_MASK(__lhxw) \
> + (BIT(__lhxw) - 1)
> +#define APLIC_xMSICFGADDR_PPN_LHX_SHIFT(__lhxs) \
> + ((__lhxs))
> +#define APLIC_xMSICFGADDR_PPN_LHX(__lhxw, __lhxs) \
> + (APLIC_xMSICFGADDR_PPN_LHX_MASK(__lhxw) << \
> + APLIC_xMSICFGADDR_PPN_LHX_SHIFT(__lhxs))
> +
> +#define APLIC_xMSICFGADDR_PPN_HHX_MASK(__hhxw) \
> + (BIT(__hhxw) - 1)
> +#define APLIC_xMSICFGADDR_PPN_HHX_SHIFT(__hhxs) \
> + ((__hhxs) + APLIC_xMSICFGADDR_PPN_SHIFT)
> +#define APLIC_xMSICFGADDR_PPN_HHX(__hhxw, __hhxs) \
> + (APLIC_xMSICFGADDR_PPN_HHX_MASK(__hhxw) << \
> + APLIC_xMSICFGADDR_PPN_HHX_SHIFT(__hhxs))
> +
> +#define APLIC_IRQBITS_PER_REG 32
> +
> +#define APLIC_SETIP_BASE 0x1c00
> +#define APLIC_SETIPNUM 0x1cdc
> +
> +#define APLIC_CLRIP_BASE 0x1d00
> +#define APLIC_CLRIPNUM 0x1ddc
> +
> +#define APLIC_SETIE_BASE 0x1e00
> +#define APLIC_SETIENUM 0x1edc
> +
> +#define APLIC_CLRIE_BASE 0x1f00
> +#define APLIC_CLRIENUM 0x1fdc
> +
> +#define APLIC_SETIPNUM_LE 0x2000
> +#define APLIC_SETIPNUM_BE 0x2004
> +
> +#define APLIC_GENMSI 0x3000
> +
> +#define APLIC_TARGET_BASE 0x3004
> +#define APLIC_TARGET_HART_IDX_SHIFT 18
> +#define APLIC_TARGET_HART_IDX_MASK 0x3fff
> +#define APLIC_TARGET_GUEST_IDX_SHIFT 12
> +#define APLIC_TARGET_GUEST_IDX_MASK 0x3f
> +#define APLIC_TARGET_IPRIO_MASK 0xff
> +#define APLIC_TARGET_EIID_MASK 0x7ff
> +
> +#define APLIC_IDC_BASE 0x4000
> +#define APLIC_IDC_SIZE 32
> +
> +#define APLIC_IDC_IDELIVERY 0x00
> +
> +#define APLIC_IDC_IFORCE 0x04
> +
> +#define APLIC_IDC_ITHRESHOLD 0x08
> +
> +#define APLIC_IDC_TOPI 0x18
> +#define APLIC_IDC_TOPI_ID_SHIFT 16
> +#define APLIC_IDC_TOPI_ID_MASK 0x3ff
> +#define APLIC_IDC_TOPI_PRIO_MASK 0xff
> +
> +#define APLIC_IDC_CLAIMI 0x1c
> +
> +#endif
> --
> 2.34.1
>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Thanks,
Andy
On Tue, Jan 30, 2024 at 11:19 PM Björn Töpel <[email protected]> wrote:
>
> Anup Patel <[email protected]> writes:
>
> > On Tue, Jan 30, 2024 at 8:18 PM Björn Töpel <[email protected]> wrote:
> >>
> >> Björn Töpel <[email protected]> writes:
> >>
> >> > Anup Patel <[email protected]> writes:
> >> >
> >> >> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <[email protected]> wrote:
> >> >>>
> >> >>> Björn Töpel <[email protected]> writes:
> >> >>>
> >> >>> > Anup Patel <[email protected]> writes:
> >> >>> >
> >> >>> >> The RISC-V AIA specification is ratified as-per the RISC-V international
> >> >>> >> process. The latest ratified AIA specifcation can be found at:
> >> >>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> >> >>> >>
> >> >>> >> At a high-level, the AIA specification adds three things:
> >> >>> >> 1) AIA CSRs
> >> >>> >> - Improved local interrupt support
> >> >>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
> >> >>> >> - Per-HART MSI controller
> >> >>> >> - Support MSI virtualization
> >> >>> >> - Support IPI along with virtualization
> >> >>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
> >> >>> >> - Wired interrupt controller
> >> >>> >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
> >> >>> >> - In Direct-mode, injects external interrupts directly into HARTs
> >> >>> >>
> >> >>> >> For an overview of the AIA specification, refer the AIA virtualization
> >> >>> >> talk at KVM Forum 2022:
> >> >>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> >> >>> >> https://www.youtube.com/watch?v=r071dL8Z0yo
> >> >>> >>
> >> >>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
> >> >>> >>
> >> >>> >> These patches can also be found in the riscv_aia_v12 branch at:
> >> >>> >> https://github.com/avpatel/linux.git
> >> >>> >>
> >> >>> >> Changes since v11:
> >> >>> >> - Rebased on Linux-6.8-rc1
> >> >>> >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
> >> >>> >> MSI handling to per device MSI domains" series by Thomas.
> >> >>> >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
> >> >>> >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
> >> >>> >> https://lore.kernel.org/linux-arm-kernel/[email protected]/)
> >> >>> >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
> >> >>> >> - Updated IMSIC driver to support per-device MSI domains for PCI and
> >> >>> >> platform devices.
> >> >>> >
> >> >>> > Thanks for working on this, Anup! I'm still reviewing the patches.
> >> >>> >
> >> >>> > I'm hitting a boot hang in text patching, with this series applied on
> >> >>> > 6.8-rc2. IPI issues?
> >> >>>
> >> >>> Not text patching! One cpu spinning in smp_call_function_many_cond() and
> >> >>> the others are in cpu_relax(). Smells like IPI...
> >> >>
> >> >> I tried bootefi from U-Boot multiple times but can't reproduce the
> >> >> issue you are seeing.
> >> >
> >> > Thanks! I can reproduce without EFI, and simpler command-line:
> >> >
> >> > qemu-system-riscv64 \
> >> > -bios /path/to/fw_dynamic.bin \
> >> > -kernel /path/to/Image \
> >> > -append 'earlycon console=tty0 console=ttyS0' \
> >> > -machine virt,aia=aplic-imsic \
> >> > -no-reboot -nodefaults -nographic \
> >> > -smp 4 \
> >> > -object rng-random,filename=/dev/urandom,id=rng0 \
> >> > -device virtio-rng-device,rng=rng0 \
> >> > -m 4G -chardev stdio,id=char0 -serial chardev:char0
> >> >
> >> > I can reproduce with your upstream riscv_aia_v12 plus the config in the
> >> > gist [1], and all latest QEMU/OpenSBI:
> >> >
> >> > QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging")
> >> > OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found")
> >> > Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers")
> >> >
> >> > Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e.
> >> > panicking about missing root mount ;-))
> >>
> >> More context; The hang is during a late initcall, where an ftrace direct
> >> (register_ftrace_direct()) modification is done.
> >>
> >> Stop machine is used to call into __ftrace_modify_call(). Then into the
> >> arch specific patch_text_nosync(), where flush_icache_range() hangs in
> >> flush_icache_all(). From "on_each_cpu(ipi_remote_fence_i, NULL, 1);" to
> >> on_each_cpu_cond_mask() "smp_call_function_many_cond(mask, func, info,
> >> scf_flags, cond_func);" which never returns from "csd_lock_wait(csd)"
> >> right before the end of the function.
> >>
> >> Any ideas? Disabling CONFIG_HID_BPF, that does the early ftrace code
> >> patching fixes the boot hang, but it does seem related to IPI...
> >>
> > Looks like flush_icache_all() does not use the IPIs (on_each_cpu()
> > and friends) correctly.
> >
> > On other hand, the flush_icache_mm() does the right thing by
> > doing local flush on the current CPU and IPI based flush on other
> > CPUs.
> >
> > Can you try the following patch ?
> >
> > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> > index 55a34f2020a8..a3dfbe4de832 100644
> > --- a/arch/riscv/mm/cacheflush.c
> > +++ b/arch/riscv/mm/cacheflush.c
> > @@ -19,12 +19,18 @@ static void ipi_remote_fence_i(void *info)
> >
> > void flush_icache_all(void)
> > {
> > + cpumask_t others;
> > +
> > local_flush_icache_all();
> >
> > + cpumask_andnot(&others, cpu_online_mask, cpumask_of(smp_processor_id()));
> > + if (cpumask_empty(&others))
> > + return;
> > +
> > if (IS_ENABLED(CONFIG_RISCV_SBI) && !riscv_use_ipi_for_rfence())
> > - sbi_remote_fence_i(NULL);
> > + sbi_remote_fence_i(&others);
> > else
> > - on_each_cpu(ipi_remote_fence_i, NULL, 1);
> > + on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1);
> > }
> > EXPORT_SYMBOL(flush_icache_all);
>
> Unfortunately, I see the same hang. LMK if you'd like me to try anything
> else.
I was able to reproduce this at my end but I had to use your config.
Digging further, it seems the issue is observed only when we use
in-kernel IPIs for cache flushing (instead of SBI calls) along with
some of the tracers (or debugging features) enabled. With the tracers
(or debug features) disabled we don't see any issue. In fact, the
upstream defconfig works perfectly fine with AIA drivers and
in-kernel IPIs.
It seems AIA based in-kernel IPIs are exposing some other issue
with RISC-V kernel. I will debug more to find the root cause.
Regards,
Anup
Anup Patel <[email protected]> writes:
> On Tue, Jan 30, 2024 at 11:19 PM Björn Töpel <[email protected]> wrote:
>>
>> Anup Patel <[email protected]> writes:
>>
>> > On Tue, Jan 30, 2024 at 8:18 PM Björn Töpel <[email protected]> wrote:
>> >>
>> >> Björn Töpel <[email protected]> writes:
>> >>
>> >> > Anup Patel <[email protected]> writes:
>> >> >
>> >> >> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <[email protected]> wrote:
>> >> >>>
>> >> >>> Björn Töpel <[email protected]> writes:
>> >> >>>
>> >> >>> > Anup Patel <[email protected]> writes:
>> >> >>> >
>> >> >>> >> The RISC-V AIA specification is ratified as-per the RISC-V international
>> >> >>> >> process. The latest ratified AIA specifcation can be found at:
>> >> >>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>> >> >>> >>
>> >> >>> >> At a high-level, the AIA specification adds three things:
>> >> >>> >> 1) AIA CSRs
>> >> >>> >> - Improved local interrupt support
>> >> >>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
>> >> >>> >> - Per-HART MSI controller
>> >> >>> >> - Support MSI virtualization
>> >> >>> >> - Support IPI along with virtualization
>> >> >>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
>> >> >>> >> - Wired interrupt controller
>> >> >>> >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
>> >> >>> >> - In Direct-mode, injects external interrupts directly into HARTs
>> >> >>> >>
>> >> >>> >> For an overview of the AIA specification, refer the AIA virtualization
>> >> >>> >> talk at KVM Forum 2022:
>> >> >>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
>> >> >>> >> https://www.youtube.com/watch?v=r071dL8Z0yo
>> >> >>> >>
>> >> >>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
>> >> >>> >>
>> >> >>> >> These patches can also be found in the riscv_aia_v12 branch at:
>> >> >>> >> https://github.com/avpatel/linux.git
>> >> >>> >>
>> >> >>> >> Changes since v11:
>> >> >>> >> - Rebased on Linux-6.8-rc1
>> >> >>> >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
>> >> >>> >> MSI handling to per device MSI domains" series by Thomas.
>> >> >>> >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
>> >> >>> >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
>> >> >>> >> https://lore.kernel.org/linux-arm-kernel/[email protected]/)
>> >> >>> >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
>> >> >>> >> - Updated IMSIC driver to support per-device MSI domains for PCI and
>> >> >>> >> platform devices.
>> >> >>> >
>> >> >>> > Thanks for working on this, Anup! I'm still reviewing the patches.
>> >> >>> >
>> >> >>> > I'm hitting a boot hang in text patching, with this series applied on
>> >> >>> > 6.8-rc2. IPI issues?
>> >> >>>
>> >> >>> Not text patching! One cpu spinning in smp_call_function_many_cond() and
>> >> >>> the others are in cpu_relax(). Smells like IPI...
>> >> >>
>> >> >> I tried bootefi from U-Boot multiple times but can't reproduce the
>> >> >> issue you are seeing.
>> >> >
>> >> > Thanks! I can reproduce without EFI, and simpler command-line:
>> >> >
>> >> > qemu-system-riscv64 \
>> >> > -bios /path/to/fw_dynamic.bin \
>> >> > -kernel /path/to/Image \
>> >> > -append 'earlycon console=tty0 console=ttyS0' \
>> >> > -machine virt,aia=aplic-imsic \
>> >> > -no-reboot -nodefaults -nographic \
>> >> > -smp 4 \
>> >> > -object rng-random,filename=/dev/urandom,id=rng0 \
>> >> > -device virtio-rng-device,rng=rng0 \
>> >> > -m 4G -chardev stdio,id=char0 -serial chardev:char0
>> >> >
>> >> > I can reproduce with your upstream riscv_aia_v12 plus the config in the
>> >> > gist [1], and all latest QEMU/OpenSBI:
>> >> >
>> >> > QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging")
>> >> > OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found")
>> >> > Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers")
>> >> >
>> >> > Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e.
>> >> > panicking about missing root mount ;-))
>> >>
>> >> More context; The hang is during a late initcall, where an ftrace direct
>> >> (register_ftrace_direct()) modification is done.
>> >>
>> >> Stop machine is used to call into __ftrace_modify_call(). Then into the
>> >> arch specific patch_text_nosync(), where flush_icache_range() hangs in
>> >> flush_icache_all(). From "on_each_cpu(ipi_remote_fence_i, NULL, 1);" to
>> >> on_each_cpu_cond_mask() "smp_call_function_many_cond(mask, func, info,
>> >> scf_flags, cond_func);" which never returns from "csd_lock_wait(csd)"
>> >> right before the end of the function.
>> >>
>> >> Any ideas? Disabling CONFIG_HID_BPF, that does the early ftrace code
>> >> patching fixes the boot hang, but it does seem related to IPI...
>> >>
>> > Looks like flush_icache_all() does not use the IPIs (on_each_cpu()
>> > and friends) correctly.
>> >
>> > On other hand, the flush_icache_mm() does the right thing by
>> > doing local flush on the current CPU and IPI based flush on other
>> > CPUs.
>> >
>> > Can you try the following patch ?
>> >
>> > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
>> > index 55a34f2020a8..a3dfbe4de832 100644
>> > --- a/arch/riscv/mm/cacheflush.c
>> > +++ b/arch/riscv/mm/cacheflush.c
>> > @@ -19,12 +19,18 @@ static void ipi_remote_fence_i(void *info)
>> >
>> > void flush_icache_all(void)
>> > {
>> > + cpumask_t others;
>> > +
>> > local_flush_icache_all();
>> >
>> > + cpumask_andnot(&others, cpu_online_mask, cpumask_of(smp_processor_id()));
>> > + if (cpumask_empty(&others))
>> > + return;
>> > +
>> > if (IS_ENABLED(CONFIG_RISCV_SBI) && !riscv_use_ipi_for_rfence())
>> > - sbi_remote_fence_i(NULL);
>> > + sbi_remote_fence_i(&others);
>> > else
>> > - on_each_cpu(ipi_remote_fence_i, NULL, 1);
>> > + on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1);
>> > }
>> > EXPORT_SYMBOL(flush_icache_all);
>>
>> Unfortunately, I see the same hang. LMK if you'd like me to try anything
>> else.
>
> I was able to reproduce this at my end but I had to use your config.
>
> Digging further, it seems the issue is observed only when we use
> in-kernel IPIs for cache flushing (instead of SBI calls) along with
> some of the tracers (or debugging features) enabled. With the tracers
> (or debug features) disabled we don't see any issue. In fact, the
> upstream defconfig works perfectly fine with AIA drivers and
> in-kernel IPIs.
Same here. I only see the issue for *one* scenario. Other than that
scenario, AIA is working fine! We're doing ftrace text patching, and I
wonder if this is the issue. RISC-V (unfortunately) still rely on
stop_machine() text patching (which will change!).
Again, the hang is in stop_machine() context, where interrupts should
very much be disabled, right? So, triggering an IPI will be impossible.
Dumping mstatus in QEMU:
| mstatus 0000000a000000a0
| mstatus 0000000a000000a0
| mstatus 0000000a000000a0
| mstatus 0000000a000000a0
Indeed sstatus.SIE is 0.
Seems like the bug is that text patching is trying to issue an IPI:
| [<ffffffff801145d4>] smp_call_function_many_cond+0x81e/0x8ba
| [<ffffffff80114716>] on_each_cpu_cond_mask+0x3e/0xde
| [<ffffffff80013968>] flush_icache_all+0x98/0xc4
| [<ffffffff80009c26>] patch_text_nosync+0x7c/0x146
| [<ffffffff80ef9116>] __ftrace_modify_call.constprop.0+0xca/0x120
| [<ffffffff80ef918c>] ftrace_update_ftrace_func+0x20/0x40
| [<ffffffff80efb8ac>] ftrace_modify_all_code+0x5a/0x1d8
| [<ffffffff80efba50>] __ftrace_modify_code+0x26/0x42
| [<ffffffff80131734>] multi_cpu_stop+0x14e/0x1d8
| [<ffffffff8013107a>] cpu_stopper_thread+0x9e/0x182
| [<ffffffff80077a04>] smpboot_thread_fn+0xf8/0x1d2
| [<ffffffff800718fc>] kthread+0xe8/0x108
| [<ffffffff80f1cde6>] ret_from_fork+0xe/0x20
Björn
On 27/01/2024 17:17, Anup Patel wrote:
> The RISC-V advanced interrupt architecture (AIA) specification defines
> advanced platform-level interrupt controller (APLIC) which has two modes
> of operation: 1) Direct mode and 2) MSI mode.
> (For more details, refer https://github.com/riscv/riscv-aia)
>
> In APLIC direct-mode, wired interrupts are forwared to CPUs (or HARTs)
> as a local external interrupt.
>
> We add a platform irqchip driver for the RISC-V APLIC direct-mode to
> support RISC-V platforms having only wired interrupts.
>
> Signed-off-by: Anup Patel <[email protected]>
> ---
> drivers/irqchip/Kconfig | 5 +
> drivers/irqchip/Makefile | 1 +
> drivers/irqchip/irq-riscv-aplic-direct.c | 343 +++++++++++++++++++++++
> drivers/irqchip/irq-riscv-aplic-main.c | 232 +++++++++++++++
> drivers/irqchip/irq-riscv-aplic-main.h | 45 +++
> include/linux/irqchip/riscv-aplic.h | 119 ++++++++
> 6 files changed, 745 insertions(+)
> create mode 100644 drivers/irqchip/irq-riscv-aplic-direct.c
> create mode 100644 drivers/irqchip/irq-riscv-aplic-main.c
> create mode 100644 drivers/irqchip/irq-riscv-aplic-main.h
> create mode 100644 include/linux/irqchip/riscv-aplic.h
>
> diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
> index 2fc0cb32341a..dbc8811d3764 100644
> --- a/drivers/irqchip/Kconfig
> +++ b/drivers/irqchip/Kconfig
> @@ -546,6 +546,11 @@ config SIFIVE_PLIC
> select IRQ_DOMAIN_HIERARCHY
> select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP
>
> +config RISCV_APLIC
> + bool
> + depends on RISCV
> + select IRQ_DOMAIN_HIERARCHY
> +
> config RISCV_IMSIC
> bool
> depends on RISCV
> diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
> index abca445a3229..7f8289790ed8 100644
> --- a/drivers/irqchip/Makefile
> +++ b/drivers/irqchip/Makefile
> @@ -95,6 +95,7 @@ obj-$(CONFIG_QCOM_MPM) += irq-qcom-mpm.o
> obj-$(CONFIG_CSKY_MPINTC) += irq-csky-mpintc.o
> obj-$(CONFIG_CSKY_APB_INTC) += irq-csky-apb-intc.o
> obj-$(CONFIG_RISCV_INTC) += irq-riscv-intc.o
> +obj-$(CONFIG_RISCV_APLIC) += irq-riscv-aplic-main.o irq-riscv-aplic-direct.o
> obj-$(CONFIG_RISCV_IMSIC) += irq-riscv-imsic-state.o irq-riscv-imsic-early.o irq-riscv-imsic-platform.o
> obj-$(CONFIG_SIFIVE_PLIC) += irq-sifive-plic.o
> obj-$(CONFIG_IMX_IRQSTEER) += irq-imx-irqsteer.o
> diff --git a/drivers/irqchip/irq-riscv-aplic-direct.c b/drivers/irqchip/irq-riscv-aplic-direct.c
> new file mode 100644
> index 000000000000..9ed2666bfb5e
> --- /dev/null
> +++ b/drivers/irqchip/irq-riscv-aplic-direct.c
> @@ -0,0 +1,343 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2021 Western Digital Corporation or its affiliates.
> + * Copyright (C) 2022 Ventana Micro Systems Inc.
2024 ?
> + */
> +
> +#include <linux/bitops.h>
> +#include <linux/cpu.h>
> +#include <linux/interrupt.h>
> +#include <linux/irqchip.h>
> +#include <linux/irqchip/chained_irq.h>
> +#include <linux/irqchip/riscv-aplic.h>
> +#include <linux/module.h>
> +#include <linux/of_address.h>
> +#include <linux/printk.h>
> +#include <linux/smp.h>
> +
> +#include "irq-riscv-aplic-main.h"
> +
> +#define APLIC_DISABLE_IDELIVERY 0
> +#define APLIC_ENABLE_IDELIVERY 1
> +#define APLIC_DISABLE_ITHRESHOLD 1
> +#define APLIC_ENABLE_ITHRESHOLD 0
> +
> +struct aplic_direct {
> + struct aplic_priv priv;
> + struct irq_domain *irqdomain;
> + struct cpumask lmask;
> +};
> +
> +struct aplic_idc {
> + unsigned int hart_index;
> + void __iomem *regs;
> + struct aplic_direct *direct;
> +};
> +
> +static unsigned int aplic_direct_parent_irq;
> +static DEFINE_PER_CPU(struct aplic_idc, aplic_idcs);
> +
> +static void aplic_direct_irq_eoi(struct irq_data *d)
> +{
> + /*
> + * The fasteoi_handler requires irq_eoi() callback hence
> + * provide a dummy handler.
> + */
> +}
> +
> +#ifdef CONFIG_SMP
> +static int aplic_direct_set_affinity(struct irq_data *d,
> + const struct cpumask *mask_val, bool force)
> +{
> + struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
> + struct aplic_direct *direct =
> + container_of(priv, struct aplic_direct, priv);
> + struct aplic_idc *idc;
> + unsigned int cpu, val;
> + struct cpumask amask;
> + void __iomem *target;
> +
> + cpumask_and(&amask, &direct->lmask, mask_val);
> +
> + if (force)
> + cpu = cpumask_first(&amask);
> + else
> + cpu = cpumask_any_and(&amask, cpu_online_mask);
> +
> + if (cpu >= nr_cpu_ids)
> + return -EINVAL;
> +
> + idc = per_cpu_ptr(&aplic_idcs, cpu);
> + target = priv->regs + APLIC_TARGET_BASE;
> + target += (d->hwirq - 1) * sizeof(u32);
> + val = idc->hart_index & APLIC_TARGET_HART_IDX_MASK;
> + val <<= APLIC_TARGET_HART_IDX_SHIFT;
Hi Anup,
You could use FIELD_PREP() instead of manual mask/shift.
#define APLIC_TARGET_HART_IDX_MASK 0xfffc0000
And then FIELD_PREP(APLIC_TARGET_HART_IDX_MASK, idc->hart_index)
> + val |= APLIC_DEFAULT_PRIORITY;
> + writel(val, target);
> +
> + irq_data_update_effective_affinity(d, cpumask_of(cpu));
> +
> + return IRQ_SET_MASK_OK_DONE;
> +}
> +#endif
> +
> +static struct irq_chip aplic_direct_chip = {
> + .name = "APLIC-DIRECT",
> + .irq_mask = aplic_irq_mask,
> + .irq_unmask = aplic_irq_unmask,
> + .irq_set_type = aplic_irq_set_type,
> + .irq_eoi = aplic_direct_irq_eoi,
> +#ifdef CONFIG_SMP
> + .irq_set_affinity = aplic_direct_set_affinity,
> +#endif
> + .flags = IRQCHIP_SET_TYPE_MASKED |
> + IRQCHIP_SKIP_SET_WAKE |
> + IRQCHIP_MASK_ON_SUSPEND,
> +};
> +
> +static int aplic_direct_irqdomain_translate(struct irq_domain *d,
> + struct irq_fwspec *fwspec,
> + unsigned long *hwirq,
> + unsigned int *type)
> +{
> + struct aplic_priv *priv = d->host_data;
> +
> + return aplic_irqdomain_translate(fwspec, priv->gsi_base,
> + hwirq, type);
> +}
> +
> +static int aplic_direct_irqdomain_alloc(struct irq_domain *domain,
> + unsigned int virq, unsigned int nr_irqs,
> + void *arg)
> +{
> + int i, ret;
> + unsigned int type;
> + irq_hw_number_t hwirq;
> + struct irq_fwspec *fwspec = arg;
> + struct aplic_priv *priv = domain->host_data;
> + struct aplic_direct *direct =
> + container_of(priv, struct aplic_direct, priv);
> +
> + ret = aplic_irqdomain_translate(fwspec, priv->gsi_base,
> + &hwirq, &type);
> + if (ret)
> + return ret;
> +
> + for (i = 0; i < nr_irqs; i++) {
> + irq_domain_set_info(domain, virq + i, hwirq + i,
> + &aplic_direct_chip, priv,
> + handle_fasteoi_irq, NULL, NULL);
> + irq_set_affinity(virq + i, &direct->lmask);
> + /* See the reason described in aplic_msi_irqdomain_alloc() */
> + irq_set_status_flags(virq + i, IRQ_DISABLE_UNLAZY);
> + }
> +
> + return 0;
> +}
> +
> +static const struct irq_domain_ops aplic_direct_irqdomain_ops = {
> + .translate = aplic_direct_irqdomain_translate,
> + .alloc = aplic_direct_irqdomain_alloc,
> + .free = irq_domain_free_irqs_top,
> +};
> +
> +/*
> + * To handle an APLIC direct interrupts, we just read the CLAIMI register
> + * which will return highest priority pending interrupt and clear the
> + * pending bit of the interrupt. This process is repeated until CLAIMI
> + * register return zero value.
> + */
> +static void aplic_direct_handle_irq(struct irq_desc *desc)
> +{
> + struct aplic_idc *idc = this_cpu_ptr(&aplic_idcs);
> + struct irq_chip *chip = irq_desc_get_chip(desc);
> + struct irq_domain *irqdomain = idc->direct->irqdomain;
> + irq_hw_number_t hw_irq;
> + int irq;
> +
> + chained_irq_enter(chip, desc);
> +
> + while ((hw_irq = readl(idc->regs + APLIC_IDC_CLAIMI))) {
> + hw_irq = hw_irq >> APLIC_IDC_TOPI_ID_SHIFT;
> + irq = irq_find_mapping(irqdomain, hw_irq);
> +
> + if (unlikely(irq <= 0))
> + dev_warn_ratelimited(idc->direct->priv.dev,
> + "hw_irq %lu mapping not found\n",
> + hw_irq);
> + else
> + generic_handle_irq(irq);
> + }
> +
> + chained_irq_exit(chip, desc);
> +}
> +
> +static void aplic_idc_set_delivery(struct aplic_idc *idc, bool en)
> +{
> + u32 de = (en) ? APLIC_ENABLE_IDELIVERY : APLIC_DISABLE_IDELIVERY;
> + u32 th = (en) ? APLIC_ENABLE_ITHRESHOLD : APLIC_DISABLE_ITHRESHOLD;
> +
> + /* Priority must be less than threshold for interrupt triggering */
> + writel(th, idc->regs + APLIC_IDC_ITHRESHOLD);
> +
> + /* Delivery must be set to 1 for interrupt triggering */
> + writel(de, idc->regs + APLIC_IDC_IDELIVERY);
> +}
> +
> +static int aplic_direct_dying_cpu(unsigned int cpu)
> +{
> + if (aplic_direct_parent_irq)
> + disable_percpu_irq(aplic_direct_parent_irq);
> +
> + return 0;
> +}
> +
> +static int aplic_direct_starting_cpu(unsigned int cpu)
> +{
> + if (aplic_direct_parent_irq)
> + enable_percpu_irq(aplic_direct_parent_irq,
> + irq_get_trigger_type(aplic_direct_parent_irq));
> +
> + return 0;
> +}
> +
> +static int aplic_direct_parse_parent_hwirq(struct device *dev,
> + u32 index, u32 *parent_hwirq,
> + unsigned long *parent_hartid)
> +{
> + struct of_phandle_args parent;
> + int rc;
> +
> + /*
> + * Currently, only OF fwnode is supported so extend this
> + * function for ACPI support.
> + */
> + if (!is_of_node(dev->fwnode))
> + return -EINVAL;
> +
> + rc = of_irq_parse_one(to_of_node(dev->fwnode), index, &parent);
> + if (rc)
> + return rc;
> +
> + rc = riscv_of_parent_hartid(parent.np, parent_hartid);
> + if (rc)
> + return rc;
> +
> + *parent_hwirq = parent.args[0];
> + return 0;
> +}
> +
> +int aplic_direct_setup(struct device *dev, void __iomem *regs)
> +{
> + int i, j, rc, cpu, setup_count = 0;
> + struct aplic_direct *direct;
> + struct aplic_priv *priv;
> + struct irq_domain *domain;
> + unsigned long hartid;
> + struct aplic_idc *idc;
> + u32 val, hwirq;
> +
> + direct = kzalloc(sizeof(*direct), GFP_KERNEL);
Use devm_kzalloc() ?
> + if (!direct)
> + return -ENOMEM;
> + priv = &direct->priv;
> +
> + rc = aplic_setup_priv(priv, dev, regs);
> + if (rc) {
> + dev_err(dev, "failed to create APLIC context\n");
> + kfree(direct);
> + return rc;
> + }
> +
> + /* Setup per-CPU IDC and target CPU mask */
> + for (i = 0; i < priv->nr_idcs; i++) {
> + rc = aplic_direct_parse_parent_hwirq(dev, i, &hwirq, &hartid);
> + if (rc) {
> + dev_warn(dev, "parent irq for IDC%d not found\n", i);
> + continue;
> + }
> +
> + /*
> + * Skip interrupts other than external interrupts for
> + * current privilege level.
> + */
> + if (hwirq != RV_IRQ_EXT)
> + continue;
> +
> + cpu = riscv_hartid_to_cpuid(hartid);
> + if (cpu < 0) {
> + dev_warn(dev, "invalid cpuid for IDC%d\n", i);
> + continue;
> + }
> +
> + cpumask_set_cpu(cpu, &direct->lmask);
> +
> + idc = per_cpu_ptr(&aplic_idcs, cpu);
> + idc->hart_index = i;
> + idc->regs = priv->regs + APLIC_IDC_BASE + i * APLIC_IDC_SIZE;
> + idc->direct = direct;
> +
> + aplic_idc_set_delivery(idc, true);
> +
> + /*
> + * Boot cpu might not have APLIC hart_index = 0 so check
> + * and update target registers of all interrupts.
> + */
> + if (cpu == smp_processor_id() && idc->hart_index) {
> + val = idc->hart_index & APLIC_TARGET_HART_IDX_MASK;
> + val <<= APLIC_TARGET_HART_IDX_SHIFT;
Ditto (FIELD_PREP)
> + val |= APLIC_DEFAULT_PRIORITY;
> + for (j = 1; j <= priv->nr_irqs; j++)
> + writel(val, priv->regs + APLIC_TARGET_BASE +
> + (j - 1) * sizeof(u32));
> + }
> +
> + setup_count++;
> + }
> +
> + /* Find parent domain and register chained handler */
> + domain = irq_find_matching_fwnode(riscv_get_intc_hwnode(),
> + DOMAIN_BUS_ANY);
> + if (!aplic_direct_parent_irq && domain) {
> + aplic_direct_parent_irq = irq_create_mapping(domain, RV_IRQ_EXT);
> + if (aplic_direct_parent_irq) {
> + irq_set_chained_handler(aplic_direct_parent_irq,
> + aplic_direct_handle_irq);
> +
> + /*
> + * Setup CPUHP notifier to enable parent
> + * interrupt on all CPUs
> + */
> + cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> + "irqchip/riscv/aplic:starting",
> + aplic_direct_starting_cpu,
> + aplic_direct_dying_cpu);
> + }
> + }
> +
> + /* Fail if we were not able to setup IDC for any CPU */
> + if (!setup_count) {
> + kfree(direct);
Shouldn't the cpuhp state also be destroyed (cpuhp_remove_state()) ?
> + return -ENODEV;
> + }
> +
> + /* Setup global config and interrupt delivery */
> + aplic_init_hw_global(priv, false);
> +
> + /* Create irq domain instance for the APLIC */
> + direct->irqdomain = irq_domain_create_linear(dev->fwnode,
> + priv->nr_irqs + 1,
> + &aplic_direct_irqdomain_ops,
> + priv);
> + if (!direct->irqdomain) {
> + dev_err(dev, "failed to create direct irq domain\n");
> + kfree(direct);
> + return -ENOMEM;
> + }
> +
> + /* Advertise the interrupt controller */
> + dev_info(dev, "%d interrupts directly connected to %d CPUs\n",
> + priv->nr_irqs, priv->nr_idcs);
> +
> + return 0;
> +}
> diff --git a/drivers/irqchip/irq-riscv-aplic-main.c b/drivers/irqchip/irq-riscv-aplic-main.c
> new file mode 100644
> index 000000000000..87450708a733
> --- /dev/null
> +++ b/drivers/irqchip/irq-riscv-aplic-main.c
> @@ -0,0 +1,232 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2021 Western Digital Corporation or its affiliates.
> + * Copyright (C) 2022 Ventana Micro Systems Inc.
2024 ?
> + */
> +
> +#include <linux/of.h>
> +#include <linux/of_irq.h>
> +#include <linux/printk.h>
> +#include <linux/module.h>
> +#include <linux/platform_device.h>
> +#include <linux/irqchip/riscv-aplic.h>
> +
> +#include "irq-riscv-aplic-main.h"
> +
> +void aplic_irq_unmask(struct irq_data *d)
> +{
> + struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
> +
> + writel(d->hwirq, priv->regs + APLIC_SETIENUM);
> +}
> +
> +void aplic_irq_mask(struct irq_data *d)
> +{
> + struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
> +
> + writel(d->hwirq, priv->regs + APLIC_CLRIENUM);
> +}
> +
> +int aplic_irq_set_type(struct irq_data *d, unsigned int type)
> +{
> + u32 val = 0;
> + void __iomem *sourcecfg;
> + struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
> +
> + switch (type) {
> + case IRQ_TYPE_NONE:
> + val = APLIC_SOURCECFG_SM_INACTIVE;
> + break;
> + case IRQ_TYPE_LEVEL_LOW:
> + val = APLIC_SOURCECFG_SM_LEVEL_LOW;
> + break;
> + case IRQ_TYPE_LEVEL_HIGH:
> + val = APLIC_SOURCECFG_SM_LEVEL_HIGH;
> + break;
> + case IRQ_TYPE_EDGE_FALLING:
> + val = APLIC_SOURCECFG_SM_EDGE_FALL;
> + break;
> + case IRQ_TYPE_EDGE_RISING:
> + val = APLIC_SOURCECFG_SM_EDGE_RISE;
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + sourcecfg = priv->regs + APLIC_SOURCECFG_BASE;
> + sourcecfg += (d->hwirq - 1) * sizeof(u32);
> + writel(val, sourcecfg);
> +
> + return 0;
> +}
> +
> +int aplic_irqdomain_translate(struct irq_fwspec *fwspec, u32 gsi_base,
> + unsigned long *hwirq, unsigned int *type)
> +{
> + if (WARN_ON(fwspec->param_count < 2))
> + return -EINVAL;
> + if (WARN_ON(!fwspec->param[0]))
> + return -EINVAL;
> +
> + /* For DT, gsi_base is always zero. */
> + *hwirq = fwspec->param[0] - gsi_base;
> + *type = fwspec->param[1] & IRQ_TYPE_SENSE_MASK;
> +
> + WARN_ON(*type == IRQ_TYPE_NONE);
> +
> + return 0;
> +}
> +
> +void aplic_init_hw_global(struct aplic_priv *priv, bool msi_mode)
> +{
> + u32 val;
> +#ifdef CONFIG_RISCV_M_MODE
> + u32 valH;
> +
> + if (msi_mode) {
> + val = priv->msicfg.base_ppn;
> + valH = ((u64)priv->msicfg.base_ppn >> 32) &
> + APLIC_xMSICFGADDRH_BAPPN_MASK;
> + valH |= (priv->msicfg.lhxw & APLIC_xMSICFGADDRH_LHXW_MASK)
> + << APLIC_xMSICFGADDRH_LHXW_SHIFT;
> + valH |= (priv->msicfg.hhxw & APLIC_xMSICFGADDRH_HHXW_MASK)
> + << APLIC_xMSICFGADDRH_HHXW_SHIFT;
> + valH |= (priv->msicfg.lhxs & APLIC_xMSICFGADDRH_LHXS_MASK)
> + << APLIC_xMSICFGADDRH_LHXS_SHIFT;
> + valH |= (priv->msicfg.hhxs & APLIC_xMSICFGADDRH_HHXS_MASK)
> + << APLIC_xMSICFGADDRH_HHXS_SHIFT;
Use FIELD_PREP for all of these.
> + writel(val, priv->regs + APLIC_xMSICFGADDR);
> + writel(valH, priv->regs + APLIC_xMSICFGADDRH);
> + }
> +#endif
> +
> + /* Setup APLIC domaincfg register */
> + val = readl(priv->regs + APLIC_DOMAINCFG);
> + val |= APLIC_DOMAINCFG_IE;
> + if (msi_mode)
> + val |= APLIC_DOMAINCFG_DM;
> + writel(val, priv->regs + APLIC_DOMAINCFG);
> + if (readl(priv->regs + APLIC_DOMAINCFG) != val)
> + dev_warn(priv->dev, "unable to write 0x%x in domaincfg\n",
> + val);
> +}
> +
> +static void aplic_init_hw_irqs(struct aplic_priv *priv)
> +{
> + int i;
> +
> + /* Disable all interrupts */
> + for (i = 0; i <= priv->nr_irqs; i += 32)
> + writel(-1U, priv->regs + APLIC_CLRIE_BASE +
> + (i / 32) * sizeof(u32));
> +
> + /* Set interrupt type and default priority for all interrupts */
> + for (i = 1; i <= priv->nr_irqs; i++) {
> + writel(0, priv->regs + APLIC_SOURCECFG_BASE +
> + (i - 1) * sizeof(u32));
> + writel(APLIC_DEFAULT_PRIORITY,
> + priv->regs + APLIC_TARGET_BASE +
> + (i - 1) * sizeof(u32));
> + }
> +
> + /* Clear APLIC domaincfg */
> + writel(0, priv->regs + APLIC_DOMAINCFG);
> +}
> +
> +int aplic_setup_priv(struct aplic_priv *priv, struct device *dev,
> + void __iomem *regs)
> +{
> + struct of_phandle_args parent;
> + int rc;
> +
> + /*
> + * Currently, only OF fwnode is supported so extend this
> + * function for ACPI support.
> + */
> + if (!is_of_node(dev->fwnode))
> + return -EINVAL;
> +
> + /* Save device pointer and register base */
> + priv->dev = dev;
> + priv->regs = regs;
> +
> + /* Find out number of interrupt sources */
> + rc = of_property_read_u32(to_of_node(dev->fwnode),
> + "riscv,num-sources",
> + &priv->nr_irqs);
Use device_property_read_u32() which works for both ACPI and OF.
> + if (rc) {
> + dev_err(dev, "failed to get number of interrupt sources\n");
> + return rc;
> + }
> +
> + /*
> + * Find out number of IDCs based on parent interrupts
> + *
> + * If "msi-parent" property is present then we ignore the
> + * APLIC IDCs which forces the APLIC driver to use MSI mode.
> + */
> + if (!of_property_present(to_of_node(dev->fwnode), "msi-parent")) {
device_property_present()
> + while (!of_irq_parse_one(to_of_node(dev->fwnode),
> + priv->nr_idcs, &parent))
> + priv->nr_idcs++;
> + }
> +
> + /* Setup initial state APLIC interrupts */
> + aplic_init_hw_irqs(priv);
> +
> + return 0;
> +}
> +
> +static int aplic_probe(struct platform_device *pdev)
> +{
> + struct device *dev = &pdev->dev;
> + bool msi_mode = false;
> + struct resource *res;
> + void __iomem *regs;
> + int rc;
> +
> + /* Map the MMIO registers */
> + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> + if (!res) {
> + dev_err(dev, "failed to get MMIO resource\n");
> + return -EINVAL;
> + }
> + regs = devm_ioremap(&pdev->dev, res->start, resource_size(res));
> + if (!regs) {
> + dev_err(dev, "failed map MMIO registers\n");
> + return -ENOMEM;
> + }
Maybe use devm_platform_ioremap_resource() since you don't need "res"
after that.
> +
> + /*
> + * If msi-parent property is present then setup APLIC MSI
> + * mode otherwise setup APLIC direct mode.
> + */
> + if (is_of_node(dev->fwnode))
> + msi_mode = of_property_present(to_of_node(dev->fwnode),
> + "msi-parent");
> + if (msi_mode)
> + rc = -ENODEV;
> + else
> + rc = aplic_direct_setup(dev, regs);
> + if (rc) {
> + dev_err(dev, "failed setup APLIC in %s mode\n",
nitpick: maybe reword it like "Failed to setup APLIC" or "APLIC setup
failed in %s mode"
> + msi_mode ? "MSI" : "direct");
> + return rc;
Remove this return.
> + }
> +
> + return 0;
return rc;
> +}
> +
> +static const struct of_device_id aplic_match[] = {
> + { .compatible = "riscv,aplic" },
> + {}
> +};
> +
> +static struct platform_driver aplic_driver = {
> + .driver = {
> + .name = "riscv-aplic",
> + .of_match_table = aplic_match,
> + },
> + .probe = aplic_probe,
> +};
> +builtin_platform_driver(aplic_driver);
> diff --git a/drivers/irqchip/irq-riscv-aplic-main.h b/drivers/irqchip/irq-riscv-aplic-main.h
> new file mode 100644
> index 000000000000..474a04229334
> --- /dev/null
> +++ b/drivers/irqchip/irq-riscv-aplic-main.h
> @@ -0,0 +1,45 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2021 Western Digital Corporation or its affiliates.
> + * Copyright (C) 2022 Ventana Micro Systems Inc.
2024 ?
> + */
> +
> +#ifndef _IRQ_RISCV_APLIC_MAIN_H
> +#define _IRQ_RISCV_APLIC_MAIN_H
> +
> +#include <linux/device.h>
> +#include <linux/io.h>
> +#include <linux/irq.h>
> +#include <linux/irqdomain.h>
> +#include <linux/fwnode.h>
> +
> +#define APLIC_DEFAULT_PRIORITY 1
> +
> +struct aplic_msicfg {
> + phys_addr_t base_ppn;
> + u32 hhxs;
> + u32 hhxw;
> + u32 lhxs;
> + u32 lhxw;
> +};
> +
> +struct aplic_priv {
> + struct device *dev;
> + u32 gsi_base;
> + u32 nr_irqs;
> + u32 nr_idcs;
> + void __iomem *regs;
> + struct aplic_msicfg msicfg;
> +};
> +
> +void aplic_irq_unmask(struct irq_data *d);
> +void aplic_irq_mask(struct irq_data *d);
> +int aplic_irq_set_type(struct irq_data *d, unsigned int type);
> +int aplic_irqdomain_translate(struct irq_fwspec *fwspec, u32 gsi_base,
> + unsigned long *hwirq, unsigned int *type);
> +void aplic_init_hw_global(struct aplic_priv *priv, bool msi_mode);
> +int aplic_setup_priv(struct aplic_priv *priv, struct device *dev,
> + void __iomem *regs);
> +int aplic_direct_setup(struct device *dev, void __iomem *regs);
> +
> +#endif
> diff --git a/include/linux/irqchip/riscv-aplic.h b/include/linux/irqchip/riscv-aplic.h
> new file mode 100644
> index 000000000000..97e198ea0109
> --- /dev/null
> +++ b/include/linux/irqchip/riscv-aplic.h
> @@ -0,0 +1,119 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2021 Western Digital Corporation or its affiliates.
> + * Copyright (C) 2022 Ventana Micro Systems Inc.
> + */
> +#ifndef __LINUX_IRQCHIP_RISCV_APLIC_H
> +#define __LINUX_IRQCHIP_RISCV_APLIC_H
> +
> +#include <linux/bitops.h>
> +
> +#define APLIC_MAX_IDC BIT(14)
> +#define APLIC_MAX_SOURCE 1024
> +
> +#define APLIC_DOMAINCFG 0x0000
> +#define APLIC_DOMAINCFG_RDONLY 0x80000000
> +#define APLIC_DOMAINCFG_IE BIT(8)
> +#define APLIC_DOMAINCFG_DM BIT(2)
> +#define APLIC_DOMAINCFG_BE BIT(0)
> +
> +#define APLIC_SOURCECFG_BASE 0x0004
> +#define APLIC_SOURCECFG_D BIT(10)
> +#define APLIC_SOURCECFG_CHILDIDX_MASK 0x000003ff
> +#define APLIC_SOURCECFG_SM_MASK 0x00000007
> +#define APLIC_SOURCECFG_SM_INACTIVE 0x0
> +#define APLIC_SOURCECFG_SM_DETACH 0x1
> +#define APLIC_SOURCECFG_SM_EDGE_RISE 0x4
> +#define APLIC_SOURCECFG_SM_EDGE_FALL 0x5
> +#define APLIC_SOURCECFG_SM_LEVEL_HIGH 0x6
> +#define APLIC_SOURCECFG_SM_LEVEL_LOW 0x7
> +
> +#define APLIC_MMSICFGADDR 0x1bc0
> +#define APLIC_MMSICFGADDRH 0x1bc4
> +#define APLIC_SMSICFGADDR 0x1bc8
> +#define APLIC_SMSICFGADDRH 0x1bcc
> +
> +#ifdef CONFIG_RISCV_M_MODE
> +#define APLIC_xMSICFGADDR APLIC_MMSICFGADDR
> +#define APLIC_xMSICFGADDRH APLIC_MMSICFGADDRH
> +#else
> +#define APLIC_xMSICFGADDR APLIC_SMSICFGADDR
> +#define APLIC_xMSICFGADDRH APLIC_SMSICFGADDRH
> +#endif
> +
> +#define APLIC_xMSICFGADDRH_L BIT(31)
> +#define APLIC_xMSICFGADDRH_HHXS_MASK 0x1f
> +#define APLIC_xMSICFGADDRH_HHXS_SHIFT 24
> +#define APLIC_xMSICFGADDRH_LHXS_MASK 0x7
> +#define APLIC_xMSICFGADDRH_LHXS_SHIFT 20
> +#define APLIC_xMSICFGADDRH_HHXW_MASK 0x7
> +#define APLIC_xMSICFGADDRH_HHXW_SHIFT 16
> +#define APLIC_xMSICFGADDRH_LHXW_MASK 0xf
> +#define APLIC_xMSICFGADDRH_LHXW_SHIFT 12
> +#define APLIC_xMSICFGADDRH_BAPPN_MASK 0xfff
> +
> +#define APLIC_xMSICFGADDR_PPN_SHIFT 12
> +
> +#define APLIC_xMSICFGADDR_PPN_HART(__lhxs) \
> + (BIT(__lhxs) - 1)
> +
> +#define APLIC_xMSICFGADDR_PPN_LHX_MASK(__lhxw) \
> + (BIT(__lhxw) - 1)
> +#define APLIC_xMSICFGADDR_PPN_LHX_SHIFT(__lhxs) \
> + ((__lhxs))
> +#define APLIC_xMSICFGADDR_PPN_LHX(__lhxw, __lhxs) \
> + (APLIC_xMSICFGADDR_PPN_LHX_MASK(__lhxw) << \
> + APLIC_xMSICFGADDR_PPN_LHX_SHIFT(__lhxs))
> +
> +#define APLIC_xMSICFGADDR_PPN_HHX_MASK(__hhxw) \
> + (BIT(__hhxw) - 1)
> +#define APLIC_xMSICFGADDR_PPN_HHX_SHIFT(__hhxs) \
> + ((__hhxs) + APLIC_xMSICFGADDR_PPN_SHIFT)
> +#define APLIC_xMSICFGADDR_PPN_HHX(__hhxw, __hhxs) \
> + (APLIC_xMSICFGADDR_PPN_HHX_MASK(__hhxw) << \
> + APLIC_xMSICFGADDR_PPN_HHX_SHIFT(__hhxs))
> +
> +#define APLIC_IRQBITS_PER_REG 32
> +
> +#define APLIC_SETIP_BASE 0x1c00
> +#define APLIC_SETIPNUM 0x1cdc
> +
> +#define APLIC_CLRIP_BASE 0x1d00
> +#define APLIC_CLRIPNUM 0x1ddc
> +
> +#define APLIC_SETIE_BASE 0x1e00
> +#define APLIC_SETIENUM 0x1edc
> +
> +#define APLIC_CLRIE_BASE 0x1f00
> +#define APLIC_CLRIENUM 0x1fdc
> +
> +#define APLIC_SETIPNUM_LE 0x2000
> +#define APLIC_SETIPNUM_BE 0x2004
> +
> +#define APLIC_GENMSI 0x3000
> +
> +#define APLIC_TARGET_BASE 0x3004
> +#define APLIC_TARGET_HART_IDX_SHIFT 18
> +#define APLIC_TARGET_HART_IDX_MASK 0x3fff
> +#define APLIC_TARGET_GUEST_IDX_SHIFT 12
> +#define APLIC_TARGET_GUEST_IDX_MASK 0x3f
> +#define APLIC_TARGET_IPRIO_MASK 0xff
> +#define APLIC_TARGET_EIID_MASK 0x7ff
> +
> +#define APLIC_IDC_BASE 0x4000
> +#define APLIC_IDC_SIZE 32
> +
> +#define APLIC_IDC_IDELIVERY 0x00
> +
> +#define APLIC_IDC_IFORCE 0x04
> +
> +#define APLIC_IDC_ITHRESHOLD 0x08
> +
> +#define APLIC_IDC_TOPI 0x18
> +#define APLIC_IDC_TOPI_ID_SHIFT 16
> +#define APLIC_IDC_TOPI_ID_MASK 0x3ff
> +#define APLIC_IDC_TOPI_PRIO_MASK 0xff
> +
> +#define APLIC_IDC_CLAIMI 0x1c
> +
> +#endif
Thanks,
Clément
On Fri, Feb 2, 2024 at 2:59 PM Clément Léger <[email protected]> wrote:
>
>
>
> On 27/01/2024 17:17, Anup Patel wrote:
> > The RISC-V advanced interrupt architecture (AIA) specification defines
> > advanced platform-level interrupt controller (APLIC) which has two modes
> > of operation: 1) Direct mode and 2) MSI mode.
> > (For more details, refer https://github.com/riscv/riscv-aia)
> >
> > In APLIC direct-mode, wired interrupts are forwared to CPUs (or HARTs)
> > as a local external interrupt.
> >
> > We add a platform irqchip driver for the RISC-V APLIC direct-mode to
> > support RISC-V platforms having only wired interrupts.
> >
> > Signed-off-by: Anup Patel <[email protected]>
> > ---
> > drivers/irqchip/Kconfig | 5 +
> > drivers/irqchip/Makefile | 1 +
> > drivers/irqchip/irq-riscv-aplic-direct.c | 343 +++++++++++++++++++++++
> > drivers/irqchip/irq-riscv-aplic-main.c | 232 +++++++++++++++
> > drivers/irqchip/irq-riscv-aplic-main.h | 45 +++
> > include/linux/irqchip/riscv-aplic.h | 119 ++++++++
> > 6 files changed, 745 insertions(+)
> > create mode 100644 drivers/irqchip/irq-riscv-aplic-direct.c
> > create mode 100644 drivers/irqchip/irq-riscv-aplic-main.c
> > create mode 100644 drivers/irqchip/irq-riscv-aplic-main.h
> > create mode 100644 include/linux/irqchip/riscv-aplic.h
> >
> > diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
> > index 2fc0cb32341a..dbc8811d3764 100644
> > --- a/drivers/irqchip/Kconfig
> > +++ b/drivers/irqchip/Kconfig
> > @@ -546,6 +546,11 @@ config SIFIVE_PLIC
> > select IRQ_DOMAIN_HIERARCHY
> > select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP
> >
> > +config RISCV_APLIC
> > + bool
> > + depends on RISCV
> > + select IRQ_DOMAIN_HIERARCHY
> > +
> > config RISCV_IMSIC
> > bool
> > depends on RISCV
> > diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
> > index abca445a3229..7f8289790ed8 100644
> > --- a/drivers/irqchip/Makefile
> > +++ b/drivers/irqchip/Makefile
> > @@ -95,6 +95,7 @@ obj-$(CONFIG_QCOM_MPM) += irq-qcom-mpm.o
> > obj-$(CONFIG_CSKY_MPINTC) += irq-csky-mpintc.o
> > obj-$(CONFIG_CSKY_APB_INTC) += irq-csky-apb-intc.o
> > obj-$(CONFIG_RISCV_INTC) += irq-riscv-intc.o
> > +obj-$(CONFIG_RISCV_APLIC) += irq-riscv-aplic-main.o irq-riscv-aplic-direct.o
> > obj-$(CONFIG_RISCV_IMSIC) += irq-riscv-imsic-state.o irq-riscv-imsic-early.o irq-riscv-imsic-platform.o
> > obj-$(CONFIG_SIFIVE_PLIC) += irq-sifive-plic.o
> > obj-$(CONFIG_IMX_IRQSTEER) += irq-imx-irqsteer.o
> > diff --git a/drivers/irqchip/irq-riscv-aplic-direct.c b/drivers/irqchip/irq-riscv-aplic-direct.c
> > new file mode 100644
> > index 000000000000..9ed2666bfb5e
> > --- /dev/null
> > +++ b/drivers/irqchip/irq-riscv-aplic-direct.c
> > @@ -0,0 +1,343 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2021 Western Digital Corporation or its affiliates.
> > + * Copyright (C) 2022 Ventana Micro Systems Inc.
>
> 2024 ?
Well, it was the year 2022 when the patch was first posted.
Are we supposed to revise this with every passing year ?
>
> > + */
> > +
> > +#include <linux/bitops.h>
> > +#include <linux/cpu.h>
> > +#include <linux/interrupt.h>
> > +#include <linux/irqchip.h>
> > +#include <linux/irqchip/chained_irq.h>
> > +#include <linux/irqchip/riscv-aplic.h>
> > +#include <linux/module.h>
> > +#include <linux/of_address.h>
> > +#include <linux/printk.h>
> > +#include <linux/smp.h>
> > +
> > +#include "irq-riscv-aplic-main.h"
> > +
> > +#define APLIC_DISABLE_IDELIVERY 0
> > +#define APLIC_ENABLE_IDELIVERY 1
> > +#define APLIC_DISABLE_ITHRESHOLD 1
> > +#define APLIC_ENABLE_ITHRESHOLD 0
> > +
> > +struct aplic_direct {
> > + struct aplic_priv priv;
> > + struct irq_domain *irqdomain;
> > + struct cpumask lmask;
> > +};
> > +
> > +struct aplic_idc {
> > + unsigned int hart_index;
> > + void __iomem *regs;
> > + struct aplic_direct *direct;
> > +};
> > +
> > +static unsigned int aplic_direct_parent_irq;
> > +static DEFINE_PER_CPU(struct aplic_idc, aplic_idcs);
> > +
> > +static void aplic_direct_irq_eoi(struct irq_data *d)
> > +{
> > + /*
> > + * The fasteoi_handler requires irq_eoi() callback hence
> > + * provide a dummy handler.
> > + */
> > +}
> > +
> > +#ifdef CONFIG_SMP
> > +static int aplic_direct_set_affinity(struct irq_data *d,
> > + const struct cpumask *mask_val, bool force)
> > +{
> > + struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
> > + struct aplic_direct *direct =
> > + container_of(priv, struct aplic_direct, priv);
> > + struct aplic_idc *idc;
> > + unsigned int cpu, val;
> > + struct cpumask amask;
> > + void __iomem *target;
> > +
> > + cpumask_and(&amask, &direct->lmask, mask_val);
> > +
> > + if (force)
> > + cpu = cpumask_first(&amask);
> > + else
> > + cpu = cpumask_any_and(&amask, cpu_online_mask);
> > +
> > + if (cpu >= nr_cpu_ids)
> > + return -EINVAL;
> > +
> > + idc = per_cpu_ptr(&aplic_idcs, cpu);
> > + target = priv->regs + APLIC_TARGET_BASE;
> > + target += (d->hwirq - 1) * sizeof(u32);
> > + val = idc->hart_index & APLIC_TARGET_HART_IDX_MASK;
> > + val <<= APLIC_TARGET_HART_IDX_SHIFT;
>
> Hi Anup,
>
> You could use FIELD_PREP() instead of manual mask/shift.
>
> #define APLIC_TARGET_HART_IDX_MASK 0xfffc0000
>
> And then FIELD_PREP(APLIC_TARGET_HART_IDX_MASK, idc->hart_index)
Okay, I will update.
>
>
> > + val |= APLIC_DEFAULT_PRIORITY;
> > + writel(val, target);
> > +
> > + irq_data_update_effective_affinity(d, cpumask_of(cpu));
> > +
> > + return IRQ_SET_MASK_OK_DONE;
> > +}
> > +#endif
> > +
> > +static struct irq_chip aplic_direct_chip = {
> > + .name = "APLIC-DIRECT",
> > + .irq_mask = aplic_irq_mask,
> > + .irq_unmask = aplic_irq_unmask,
> > + .irq_set_type = aplic_irq_set_type,
> > + .irq_eoi = aplic_direct_irq_eoi,
> > +#ifdef CONFIG_SMP
> > + .irq_set_affinity = aplic_direct_set_affinity,
> > +#endif
> > + .flags = IRQCHIP_SET_TYPE_MASKED |
> > + IRQCHIP_SKIP_SET_WAKE |
> > + IRQCHIP_MASK_ON_SUSPEND,
> > +};
> > +
> > +static int aplic_direct_irqdomain_translate(struct irq_domain *d,
> > + struct irq_fwspec *fwspec,
> > + unsigned long *hwirq,
> > + unsigned int *type)
> > +{
> > + struct aplic_priv *priv = d->host_data;
> > +
> > + return aplic_irqdomain_translate(fwspec, priv->gsi_base,
> > + hwirq, type);
> > +}
> > +
> > +static int aplic_direct_irqdomain_alloc(struct irq_domain *domain,
> > + unsigned int virq, unsigned int nr_irqs,
> > + void *arg)
> > +{
> > + int i, ret;
> > + unsigned int type;
> > + irq_hw_number_t hwirq;
> > + struct irq_fwspec *fwspec = arg;
> > + struct aplic_priv *priv = domain->host_data;
> > + struct aplic_direct *direct =
> > + container_of(priv, struct aplic_direct, priv);
> > +
> > + ret = aplic_irqdomain_translate(fwspec, priv->gsi_base,
> > + &hwirq, &type);
> > + if (ret)
> > + return ret;
> > +
> > + for (i = 0; i < nr_irqs; i++) {
> > + irq_domain_set_info(domain, virq + i, hwirq + i,
> > + &aplic_direct_chip, priv,
> > + handle_fasteoi_irq, NULL, NULL);
> > + irq_set_affinity(virq + i, &direct->lmask);
> > + /* See the reason described in aplic_msi_irqdomain_alloc() */
> > + irq_set_status_flags(virq + i, IRQ_DISABLE_UNLAZY);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static const struct irq_domain_ops aplic_direct_irqdomain_ops = {
> > + .translate = aplic_direct_irqdomain_translate,
> > + .alloc = aplic_direct_irqdomain_alloc,
> > + .free = irq_domain_free_irqs_top,
> > +};
> > +
> > +/*
> > + * To handle an APLIC direct interrupts, we just read the CLAIMI register
> > + * which will return highest priority pending interrupt and clear the
> > + * pending bit of the interrupt. This process is repeated until CLAIMI
> > + * register return zero value.
> > + */
> > +static void aplic_direct_handle_irq(struct irq_desc *desc)
> > +{
> > + struct aplic_idc *idc = this_cpu_ptr(&aplic_idcs);
> > + struct irq_chip *chip = irq_desc_get_chip(desc);
> > + struct irq_domain *irqdomain = idc->direct->irqdomain;
> > + irq_hw_number_t hw_irq;
> > + int irq;
> > +
> > + chained_irq_enter(chip, desc);
> > +
> > + while ((hw_irq = readl(idc->regs + APLIC_IDC_CLAIMI))) {
> > + hw_irq = hw_irq >> APLIC_IDC_TOPI_ID_SHIFT;
> > + irq = irq_find_mapping(irqdomain, hw_irq);
> > +
> > + if (unlikely(irq <= 0))
> > + dev_warn_ratelimited(idc->direct->priv.dev,
> > + "hw_irq %lu mapping not found\n",
> > + hw_irq);
> > + else
> > + generic_handle_irq(irq);
> > + }
> > +
> > + chained_irq_exit(chip, desc);
> > +}
> > +
> > +static void aplic_idc_set_delivery(struct aplic_idc *idc, bool en)
> > +{
> > + u32 de = (en) ? APLIC_ENABLE_IDELIVERY : APLIC_DISABLE_IDELIVERY;
> > + u32 th = (en) ? APLIC_ENABLE_ITHRESHOLD : APLIC_DISABLE_ITHRESHOLD;
> > +
> > + /* Priority must be less than threshold for interrupt triggering */
> > + writel(th, idc->regs + APLIC_IDC_ITHRESHOLD);
> > +
> > + /* Delivery must be set to 1 for interrupt triggering */
> > + writel(de, idc->regs + APLIC_IDC_IDELIVERY);
> > +}
> > +
> > +static int aplic_direct_dying_cpu(unsigned int cpu)
> > +{
> > + if (aplic_direct_parent_irq)
> > + disable_percpu_irq(aplic_direct_parent_irq);
> > +
> > + return 0;
> > +}
> > +
> > +static int aplic_direct_starting_cpu(unsigned int cpu)
> > +{
> > + if (aplic_direct_parent_irq)
> > + enable_percpu_irq(aplic_direct_parent_irq,
> > + irq_get_trigger_type(aplic_direct_parent_irq));
> > +
> > + return 0;
> > +}
> > +
> > +static int aplic_direct_parse_parent_hwirq(struct device *dev,
> > + u32 index, u32 *parent_hwirq,
> > + unsigned long *parent_hartid)
> > +{
> > + struct of_phandle_args parent;
> > + int rc;
> > +
> > + /*
> > + * Currently, only OF fwnode is supported so extend this
> > + * function for ACPI support.
> > + */
> > + if (!is_of_node(dev->fwnode))
> > + return -EINVAL;
> > +
> > + rc = of_irq_parse_one(to_of_node(dev->fwnode), index, &parent);
> > + if (rc)
> > + return rc;
> > +
> > + rc = riscv_of_parent_hartid(parent.np, parent_hartid);
> > + if (rc)
> > + return rc;
> > +
> > + *parent_hwirq = parent.args[0];
> > + return 0;
> > +}
> > +
> > +int aplic_direct_setup(struct device *dev, void __iomem *regs)
> > +{
> > + int i, j, rc, cpu, setup_count = 0;
> > + struct aplic_direct *direct;
> > + struct aplic_priv *priv;
> > + struct irq_domain *domain;
> > + unsigned long hartid;
> > + struct aplic_idc *idc;
> > + u32 val, hwirq;
> > +
> > + direct = kzalloc(sizeof(*direct), GFP_KERNEL);
>
> Use devm_kzalloc() ?
Okay, I will update.
>
> > + if (!direct)
> > + return -ENOMEM;
> > + priv = &direct->priv;
> > +
> > + rc = aplic_setup_priv(priv, dev, regs);
> > + if (rc) {
> > + dev_err(dev, "failed to create APLIC context\n");
> > + kfree(direct);
> > + return rc;
> > + }
> > +
> > + /* Setup per-CPU IDC and target CPU mask */
> > + for (i = 0; i < priv->nr_idcs; i++) {
> > + rc = aplic_direct_parse_parent_hwirq(dev, i, &hwirq, &hartid);
> > + if (rc) {
> > + dev_warn(dev, "parent irq for IDC%d not found\n", i);
> > + continue;
> > + }
> > +
> > + /*
> > + * Skip interrupts other than external interrupts for
> > + * current privilege level.
> > + */
> > + if (hwirq != RV_IRQ_EXT)
> > + continue;
> > +
> > + cpu = riscv_hartid_to_cpuid(hartid);
> > + if (cpu < 0) {
> > + dev_warn(dev, "invalid cpuid for IDC%d\n", i);
> > + continue;
> > + }
> > +
> > + cpumask_set_cpu(cpu, &direct->lmask);
> > +
> > + idc = per_cpu_ptr(&aplic_idcs, cpu);
> > + idc->hart_index = i;
> > + idc->regs = priv->regs + APLIC_IDC_BASE + i * APLIC_IDC_SIZE;
> > + idc->direct = direct;
> > +
> > + aplic_idc_set_delivery(idc, true);
> > +
> > + /*
> > + * Boot cpu might not have APLIC hart_index = 0 so check
> > + * and update target registers of all interrupts.
> > + */
> > + if (cpu == smp_processor_id() && idc->hart_index) {
> > + val = idc->hart_index & APLIC_TARGET_HART_IDX_MASK;
> > + val <<= APLIC_TARGET_HART_IDX_SHIFT;
>
> Ditto (FIELD_PREP)
Okay, I will update.
>
> > + val |= APLIC_DEFAULT_PRIORITY;
> > + for (j = 1; j <= priv->nr_irqs; j++)
> > + writel(val, priv->regs + APLIC_TARGET_BASE +
> > + (j - 1) * sizeof(u32));
> > + }
> > +
> > + setup_count++;
> > + }
> > +
> > + /* Find parent domain and register chained handler */
> > + domain = irq_find_matching_fwnode(riscv_get_intc_hwnode(),
> > + DOMAIN_BUS_ANY);
> > + if (!aplic_direct_parent_irq && domain) {
> > + aplic_direct_parent_irq = irq_create_mapping(domain, RV_IRQ_EXT);
> > + if (aplic_direct_parent_irq) {
> > + irq_set_chained_handler(aplic_direct_parent_irq,
> > + aplic_direct_handle_irq);
> > +
> > + /*
> > + * Setup CPUHP notifier to enable parent
> > + * interrupt on all CPUs
> > + */
> > + cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> > + "irqchip/riscv/aplic:starting",
> > + aplic_direct_starting_cpu,
> > + aplic_direct_dying_cpu);
> > + }
> > + }
> > +
> > + /* Fail if we were not able to setup IDC for any CPU */
> > + if (!setup_count) {
> > + kfree(direct);
>
> Shouldn't the cpuhp state also be destroyed (cpuhp_remove_state()) ?
Okay, I will update.
>
> > + return -ENODEV;
> > + }
> > +
> > + /* Setup global config and interrupt delivery */
> > + aplic_init_hw_global(priv, false);
> > +
> > + /* Create irq domain instance for the APLIC */
> > + direct->irqdomain = irq_domain_create_linear(dev->fwnode,
> > + priv->nr_irqs + 1,
> > + &aplic_direct_irqdomain_ops,
> > + priv);
> > + if (!direct->irqdomain) {
> > + dev_err(dev, "failed to create direct irq domain\n");
> > + kfree(direct);
> > + return -ENOMEM;
> > + }
> > +
> > + /* Advertise the interrupt controller */
> > + dev_info(dev, "%d interrupts directly connected to %d CPUs\n",
> > + priv->nr_irqs, priv->nr_idcs);
> > +
> > + return 0;
> > +}
> > diff --git a/drivers/irqchip/irq-riscv-aplic-main.c b/drivers/irqchip/irq-riscv-aplic-main.c
> > new file mode 100644
> > index 000000000000..87450708a733
> > --- /dev/null
> > +++ b/drivers/irqchip/irq-riscv-aplic-main.c
> > @@ -0,0 +1,232 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2021 Western Digital Corporation or its affiliates.
> > + * Copyright (C) 2022 Ventana Micro Systems Inc.
>
> 2024 ?
Same comment as above.
>
> > + */
> > +
> > +#include <linux/of.h>
> > +#include <linux/of_irq.h>
> > +#include <linux/printk.h>
> > +#include <linux/module.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/irqchip/riscv-aplic.h>
> > +
> > +#include "irq-riscv-aplic-main.h"
> > +
> > +void aplic_irq_unmask(struct irq_data *d)
> > +{
> > + struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
> > +
> > + writel(d->hwirq, priv->regs + APLIC_SETIENUM);
> > +}
> > +
> > +void aplic_irq_mask(struct irq_data *d)
> > +{
> > + struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
> > +
> > + writel(d->hwirq, priv->regs + APLIC_CLRIENUM);
> > +}
> > +
> > +int aplic_irq_set_type(struct irq_data *d, unsigned int type)
> > +{
> > + u32 val = 0;
> > + void __iomem *sourcecfg;
> > + struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
> > +
> > + switch (type) {
> > + case IRQ_TYPE_NONE:
> > + val = APLIC_SOURCECFG_SM_INACTIVE;
> > + break;
> > + case IRQ_TYPE_LEVEL_LOW:
> > + val = APLIC_SOURCECFG_SM_LEVEL_LOW;
> > + break;
> > + case IRQ_TYPE_LEVEL_HIGH:
> > + val = APLIC_SOURCECFG_SM_LEVEL_HIGH;
> > + break;
> > + case IRQ_TYPE_EDGE_FALLING:
> > + val = APLIC_SOURCECFG_SM_EDGE_FALL;
> > + break;
> > + case IRQ_TYPE_EDGE_RISING:
> > + val = APLIC_SOURCECFG_SM_EDGE_RISE;
> > + break;
> > + default:
> > + return -EINVAL;
> > + }
> > +
> > + sourcecfg = priv->regs + APLIC_SOURCECFG_BASE;
> > + sourcecfg += (d->hwirq - 1) * sizeof(u32);
> > + writel(val, sourcecfg);
> > +
> > + return 0;
> > +}
> > +
> > +int aplic_irqdomain_translate(struct irq_fwspec *fwspec, u32 gsi_base,
> > + unsigned long *hwirq, unsigned int *type)
> > +{
> > + if (WARN_ON(fwspec->param_count < 2))
> > + return -EINVAL;
> > + if (WARN_ON(!fwspec->param[0]))
> > + return -EINVAL;
> > +
> > + /* For DT, gsi_base is always zero. */
> > + *hwirq = fwspec->param[0] - gsi_base;
> > + *type = fwspec->param[1] & IRQ_TYPE_SENSE_MASK;
> > +
> > + WARN_ON(*type == IRQ_TYPE_NONE);
> > +
> > + return 0;
> > +}
> > +
> > +void aplic_init_hw_global(struct aplic_priv *priv, bool msi_mode)
> > +{
> > + u32 val;
> > +#ifdef CONFIG_RISCV_M_MODE
> > + u32 valH;
> > +
> > + if (msi_mode) {
> > + val = priv->msicfg.base_ppn;
> > + valH = ((u64)priv->msicfg.base_ppn >> 32) &
> > + APLIC_xMSICFGADDRH_BAPPN_MASK;
> > + valH |= (priv->msicfg.lhxw & APLIC_xMSICFGADDRH_LHXW_MASK)
> > + << APLIC_xMSICFGADDRH_LHXW_SHIFT;
> > + valH |= (priv->msicfg.hhxw & APLIC_xMSICFGADDRH_HHXW_MASK)
> > + << APLIC_xMSICFGADDRH_HHXW_SHIFT;
> > + valH |= (priv->msicfg.lhxs & APLIC_xMSICFGADDRH_LHXS_MASK)
> > + << APLIC_xMSICFGADDRH_LHXS_SHIFT;
> > + valH |= (priv->msicfg.hhxs & APLIC_xMSICFGADDRH_HHXS_MASK)
> > + << APLIC_xMSICFGADDRH_HHXS_SHIFT;
>
> Use FIELD_PREP for all of these.
Okay, I will update.
>
> > + writel(val, priv->regs + APLIC_xMSICFGADDR);
> > + writel(valH, priv->regs + APLIC_xMSICFGADDRH);
> > + }
> > +#endif
> > +
> > + /* Setup APLIC domaincfg register */
> > + val = readl(priv->regs + APLIC_DOMAINCFG);
> > + val |= APLIC_DOMAINCFG_IE;
> > + if (msi_mode)
> > + val |= APLIC_DOMAINCFG_DM;
> > + writel(val, priv->regs + APLIC_DOMAINCFG);
> > + if (readl(priv->regs + APLIC_DOMAINCFG) != val)
> > + dev_warn(priv->dev, "unable to write 0x%x in domaincfg\n",
> > + val);
> > +}
> > +
> > +static void aplic_init_hw_irqs(struct aplic_priv *priv)
> > +{
> > + int i;
> > +
> > + /* Disable all interrupts */
> > + for (i = 0; i <= priv->nr_irqs; i += 32)
> > + writel(-1U, priv->regs + APLIC_CLRIE_BASE +
> > + (i / 32) * sizeof(u32));
> > +
> > + /* Set interrupt type and default priority for all interrupts */
> > + for (i = 1; i <= priv->nr_irqs; i++) {
> > + writel(0, priv->regs + APLIC_SOURCECFG_BASE +
> > + (i - 1) * sizeof(u32));
> > + writel(APLIC_DEFAULT_PRIORITY,
> > + priv->regs + APLIC_TARGET_BASE +
> > + (i - 1) * sizeof(u32));
> > + }
> > +
> > + /* Clear APLIC domaincfg */
> > + writel(0, priv->regs + APLIC_DOMAINCFG);
> > +}
> > +
> > +int aplic_setup_priv(struct aplic_priv *priv, struct device *dev,
> > + void __iomem *regs)
> > +{
> > + struct of_phandle_args parent;
> > + int rc;
> > +
> > + /*
> > + * Currently, only OF fwnode is supported so extend this
> > + * function for ACPI support.
> > + */
> > + if (!is_of_node(dev->fwnode))
> > + return -EINVAL;
> > +
> > + /* Save device pointer and register base */
> > + priv->dev = dev;
> > + priv->regs = regs;
> > +
> > + /* Find out number of interrupt sources */
> > + rc = of_property_read_u32(to_of_node(dev->fwnode),
> > + "riscv,num-sources",
> > + &priv->nr_irqs);
>
> Use device_property_read_u32() which works for both ACPI and OF.
In the previous versions, we did try to unify property reading for
both ACPI and OF but MarcZ suggested to keep th ACPI and
OF probe paths totally separate hence we use OF APIs over
here because we should reach here only for OF probing.
>
> > + if (rc) {
> > + dev_err(dev, "failed to get number of interrupt sources\n");
> > + return rc;
> > + }
> > +
> > + /*
> > + * Find out number of IDCs based on parent interrupts
> > + *
> > + * If "msi-parent" property is present then we ignore the
> > + * APLIC IDCs which forces the APLIC driver to use MSI mode.
> > + */
> > + if (!of_property_present(to_of_node(dev->fwnode), "msi-parent")) {
>
> device_property_present()
Same comment as above.
>
> > + while (!of_irq_parse_one(to_of_node(dev->fwnode),
> > + priv->nr_idcs, &parent))
> > + priv->nr_idcs++;
> > + }
> > +
> > + /* Setup initial state APLIC interrupts */
> > + aplic_init_hw_irqs(priv);
> > +
> > + return 0;
> > +}
> > +
> > +static int aplic_probe(struct platform_device *pdev)
> > +{
> > + struct device *dev = &pdev->dev;
> > + bool msi_mode = false;
> > + struct resource *res;
> > + void __iomem *regs;
> > + int rc;
> > +
> > + /* Map the MMIO registers */
> > + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > + if (!res) {
> > + dev_err(dev, "failed to get MMIO resource\n");
> > + return -EINVAL;
> > + }
> > + regs = devm_ioremap(&pdev->dev, res->start, resource_size(res));
> > + if (!regs) {
> > + dev_err(dev, "failed map MMIO registers\n");
> > + return -ENOMEM;
> > + }
>
> Maybe use devm_platform_ioremap_resource() since you don't need "res"
> after that.
>
Okay, I will update.
>
> > +
> > + /*
> > + * If msi-parent property is present then setup APLIC MSI
> > + * mode otherwise setup APLIC direct mode.
> > + */
> > + if (is_of_node(dev->fwnode))
> > + msi_mode = of_property_present(to_of_node(dev->fwnode),
> > + "msi-parent");
> > + if (msi_mode)
> > + rc = -ENODEV;
> > + else
> > + rc = aplic_direct_setup(dev, regs);
> > + if (rc) {
> > + dev_err(dev, "failed setup APLIC in %s mode\n",
> nitpick: maybe reword it like "Failed to setup APLIC" or "APLIC setup
> failed in %s mode"
Okay.
>
> > + msi_mode ? "MSI" : "direct");
> > + return rc;
>
> Remove this return.
>
> > + }
> > +
> > + return 0;
>
> return rc;
Okay.
>
> > +}
> > +
> > +static const struct of_device_id aplic_match[] = {
> > + { .compatible = "riscv,aplic" },
> > + {}
> > +};
> > +
> > +static struct platform_driver aplic_driver = {
> > + .driver = {
> > + .name = "riscv-aplic",
> > + .of_match_table = aplic_match,
> > + },
> > + .probe = aplic_probe,
> > +};
> > +builtin_platform_driver(aplic_driver);
> > diff --git a/drivers/irqchip/irq-riscv-aplic-main.h b/drivers/irqchip/irq-riscv-aplic-main.h
> > new file mode 100644
> > index 000000000000..474a04229334
> > --- /dev/null
> > +++ b/drivers/irqchip/irq-riscv-aplic-main.h
> > @@ -0,0 +1,45 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (C) 2021 Western Digital Corporation or its affiliates.
> > + * Copyright (C) 2022 Ventana Micro Systems Inc.
>
> 2024 ?
Same comment as above.
>
> > + */
> > +
> > +#ifndef _IRQ_RISCV_APLIC_MAIN_H
> > +#define _IRQ_RISCV_APLIC_MAIN_H
> > +
> > +#include <linux/device.h>
> > +#include <linux/io.h>
> > +#include <linux/irq.h>
> > +#include <linux/irqdomain.h>
> > +#include <linux/fwnode.h>
> > +
> > +#define APLIC_DEFAULT_PRIORITY 1
> > +
> > +struct aplic_msicfg {
> > + phys_addr_t base_ppn;
> > + u32 hhxs;
> > + u32 hhxw;
> > + u32 lhxs;
> > + u32 lhxw;
> > +};
> > +
> > +struct aplic_priv {
> > + struct device *dev;
> > + u32 gsi_base;
> > + u32 nr_irqs;
> > + u32 nr_idcs;
> > + void __iomem *regs;
> > + struct aplic_msicfg msicfg;
> > +};
> > +
> > +void aplic_irq_unmask(struct irq_data *d);
> > +void aplic_irq_mask(struct irq_data *d);
> > +int aplic_irq_set_type(struct irq_data *d, unsigned int type);
> > +int aplic_irqdomain_translate(struct irq_fwspec *fwspec, u32 gsi_base,
> > + unsigned long *hwirq, unsigned int *type);
> > +void aplic_init_hw_global(struct aplic_priv *priv, bool msi_mode);
> > +int aplic_setup_priv(struct aplic_priv *priv, struct device *dev,
> > + void __iomem *regs);
> > +int aplic_direct_setup(struct device *dev, void __iomem *regs);
> > +
> > +#endif
> > diff --git a/include/linux/irqchip/riscv-aplic.h b/include/linux/irqchip/riscv-aplic.h
> > new file mode 100644
> > index 000000000000..97e198ea0109
> > --- /dev/null
> > +++ b/include/linux/irqchip/riscv-aplic.h
> > @@ -0,0 +1,119 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (C) 2021 Western Digital Corporation or its affiliates.
> > + * Copyright (C) 2022 Ventana Micro Systems Inc.
> > + */
> > +#ifndef __LINUX_IRQCHIP_RISCV_APLIC_H
> > +#define __LINUX_IRQCHIP_RISCV_APLIC_H
> > +
> > +#include <linux/bitops.h>
> > +
> > +#define APLIC_MAX_IDC BIT(14)
> > +#define APLIC_MAX_SOURCE 1024
> > +
> > +#define APLIC_DOMAINCFG 0x0000
> > +#define APLIC_DOMAINCFG_RDONLY 0x80000000
> > +#define APLIC_DOMAINCFG_IE BIT(8)
> > +#define APLIC_DOMAINCFG_DM BIT(2)
> > +#define APLIC_DOMAINCFG_BE BIT(0)
> > +
> > +#define APLIC_SOURCECFG_BASE 0x0004
> > +#define APLIC_SOURCECFG_D BIT(10)
> > +#define APLIC_SOURCECFG_CHILDIDX_MASK 0x000003ff
> > +#define APLIC_SOURCECFG_SM_MASK 0x00000007
> > +#define APLIC_SOURCECFG_SM_INACTIVE 0x0
> > +#define APLIC_SOURCECFG_SM_DETACH 0x1
> > +#define APLIC_SOURCECFG_SM_EDGE_RISE 0x4
> > +#define APLIC_SOURCECFG_SM_EDGE_FALL 0x5
> > +#define APLIC_SOURCECFG_SM_LEVEL_HIGH 0x6
> > +#define APLIC_SOURCECFG_SM_LEVEL_LOW 0x7
> > +
> > +#define APLIC_MMSICFGADDR 0x1bc0
> > +#define APLIC_MMSICFGADDRH 0x1bc4
> > +#define APLIC_SMSICFGADDR 0x1bc8
> > +#define APLIC_SMSICFGADDRH 0x1bcc
> > +
> > +#ifdef CONFIG_RISCV_M_MODE
> > +#define APLIC_xMSICFGADDR APLIC_MMSICFGADDR
> > +#define APLIC_xMSICFGADDRH APLIC_MMSICFGADDRH
> > +#else
> > +#define APLIC_xMSICFGADDR APLIC_SMSICFGADDR
> > +#define APLIC_xMSICFGADDRH APLIC_SMSICFGADDRH
> > +#endif
> > +
> > +#define APLIC_xMSICFGADDRH_L BIT(31)
> > +#define APLIC_xMSICFGADDRH_HHXS_MASK 0x1f
> > +#define APLIC_xMSICFGADDRH_HHXS_SHIFT 24
> > +#define APLIC_xMSICFGADDRH_LHXS_MASK 0x7
> > +#define APLIC_xMSICFGADDRH_LHXS_SHIFT 20
> > +#define APLIC_xMSICFGADDRH_HHXW_MASK 0x7
> > +#define APLIC_xMSICFGADDRH_HHXW_SHIFT 16
> > +#define APLIC_xMSICFGADDRH_LHXW_MASK 0xf
> > +#define APLIC_xMSICFGADDRH_LHXW_SHIFT 12
> > +#define APLIC_xMSICFGADDRH_BAPPN_MASK 0xfff
> > +
> > +#define APLIC_xMSICFGADDR_PPN_SHIFT 12
> > +
> > +#define APLIC_xMSICFGADDR_PPN_HART(__lhxs) \
> > + (BIT(__lhxs) - 1)
> > +
> > +#define APLIC_xMSICFGADDR_PPN_LHX_MASK(__lhxw) \
> > + (BIT(__lhxw) - 1)
> > +#define APLIC_xMSICFGADDR_PPN_LHX_SHIFT(__lhxs) \
> > + ((__lhxs))
> > +#define APLIC_xMSICFGADDR_PPN_LHX(__lhxw, __lhxs) \
> > + (APLIC_xMSICFGADDR_PPN_LHX_MASK(__lhxw) << \
> > + APLIC_xMSICFGADDR_PPN_LHX_SHIFT(__lhxs))
> > +
> > +#define APLIC_xMSICFGADDR_PPN_HHX_MASK(__hhxw) \
> > + (BIT(__hhxw) - 1)
> > +#define APLIC_xMSICFGADDR_PPN_HHX_SHIFT(__hhxs) \
> > + ((__hhxs) + APLIC_xMSICFGADDR_PPN_SHIFT)
> > +#define APLIC_xMSICFGADDR_PPN_HHX(__hhxw, __hhxs) \
> > + (APLIC_xMSICFGADDR_PPN_HHX_MASK(__hhxw) << \
> > + APLIC_xMSICFGADDR_PPN_HHX_SHIFT(__hhxs))
> > +
> > +#define APLIC_IRQBITS_PER_REG 32
> > +
> > +#define APLIC_SETIP_BASE 0x1c00
> > +#define APLIC_SETIPNUM 0x1cdc
> > +
> > +#define APLIC_CLRIP_BASE 0x1d00
> > +#define APLIC_CLRIPNUM 0x1ddc
> > +
> > +#define APLIC_SETIE_BASE 0x1e00
> > +#define APLIC_SETIENUM 0x1edc
> > +
> > +#define APLIC_CLRIE_BASE 0x1f00
> > +#define APLIC_CLRIENUM 0x1fdc
> > +
> > +#define APLIC_SETIPNUM_LE 0x2000
> > +#define APLIC_SETIPNUM_BE 0x2004
> > +
> > +#define APLIC_GENMSI 0x3000
> > +
> > +#define APLIC_TARGET_BASE 0x3004
> > +#define APLIC_TARGET_HART_IDX_SHIFT 18
> > +#define APLIC_TARGET_HART_IDX_MASK 0x3fff
> > +#define APLIC_TARGET_GUEST_IDX_SHIFT 12
> > +#define APLIC_TARGET_GUEST_IDX_MASK 0x3f
> > +#define APLIC_TARGET_IPRIO_MASK 0xff
> > +#define APLIC_TARGET_EIID_MASK 0x7ff
> > +
> > +#define APLIC_IDC_BASE 0x4000
> > +#define APLIC_IDC_SIZE 32
> > +
> > +#define APLIC_IDC_IDELIVERY 0x00
> > +
> > +#define APLIC_IDC_IFORCE 0x04
> > +
> > +#define APLIC_IDC_ITHRESHOLD 0x08
> > +
> > +#define APLIC_IDC_TOPI 0x18
> > +#define APLIC_IDC_TOPI_ID_SHIFT 16
> > +#define APLIC_IDC_TOPI_ID_MASK 0x3ff
> > +#define APLIC_IDC_TOPI_PRIO_MASK 0xff
> > +
> > +#define APLIC_IDC_CLAIMI 0x1c
> > +
> > +#endif
>
>
> Thanks,
>
> Clément
Regards,
Anup
On 02/02/2024 11:30, Anup Patel wrote:
>>> +int aplic_setup_priv(struct aplic_priv *priv, struct device *dev,
>>> + void __iomem *regs)
>>> +{
>>> + struct of_phandle_args parent;
>>> + int rc;
>>> +
>>> + /*
>>> + * Currently, only OF fwnode is supported so extend this
>>> + * function for ACPI support.
>>> + */
>>> + if (!is_of_node(dev->fwnode))
>>> + return -EINVAL;
>>> +
>>> + /* Save device pointer and register base */
>>> + priv->dev = dev;
>>> + priv->regs = regs;
>>> +
>>> + /* Find out number of interrupt sources */
>>> + rc = of_property_read_u32(to_of_node(dev->fwnode),
>>> + "riscv,num-sources",
>>> + &priv->nr_irqs);
>>
>> Use device_property_read_u32() which works for both ACPI and OF.
>
> In the previous versions, we did try to unify property reading for
> both ACPI and OF but MarcZ suggested to keep th ACPI and
> OF probe paths totally separate hence we use OF APIs over
> here because we should reach here only for OF probing.
Ok, indeed it makes sense. Discard that comment then !
Thanks,
Clément
Anup Patel <[email protected]> writes:
> The Linux platform MSI support allows per-device MSI domains so let
> us add a platform irqchip driver for RISC-V IMSIC which provides a
> base IRQ domain with MSI parent support for platform device domains.
>
> This driver assumes that the IMSIC state is already initialized by
> the IMSIC early driver.
>
> Signed-off-by: Anup Patel <[email protected]>
[...]
> diff --git a/drivers/irqchip/irq-riscv-imsic-platform.c b/drivers/irqchip/irq-riscv-imsic-platform.c
> new file mode 100644
> index 000000000000..65791a6b0727
> --- /dev/null
> +++ b/drivers/irqchip/irq-riscv-imsic-platform.c
> @@ -0,0 +1,371 @@
[...]
> +static int imsic_irq_retrigger(struct irq_data *d)
> +{
> + struct imsic_vector *vec = irq_data_get_irq_chip_data(d);
> + struct imsic_local_config *local;
> +
> + if (WARN_ON(vec == NULL))
> + return -ENOENT;
> +
> + local = per_cpu_ptr(imsic->global.local, vec->cpu);
> + writel(vec->local_id, local->msi_va);
Change to writel_relaxed().
Hi Anup,
Anup Patel <[email protected]> writes:
> The RISC-V AIA specification is ratified as-per the RISC-V international
> process. The latest ratified AIA specifcation can be found at:
> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>
> At a high-level, the AIA specification adds three things:
> 1) AIA CSRs
> - Improved local interrupt support
> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
> - Per-HART MSI controller
> - Support MSI virtualization
> - Support IPI along with virtualization
> 3) Advanced Platform-Level Interrupt Controller (APLIC)
> - Wired interrupt controller
> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
> - In Direct-mode, injects external interrupts directly into HARTs
>
> For an overview of the AIA specification, refer the AIA virtualization
> talk at KVM Forum 2022:
> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> https://www.youtube.com/watch?v=r071dL8Z0yo
Thank you for continuing to work on this series! I like this
direction of the series!
TL;DR: I think we can get rid of most of the id/householding data
structures, except for the irq matrix.
Most of my comments are more of a design/overview nature, so I'll
comment here in the cover letter.
I took the series for a spin with and it with Alex' ftrace fix it,
passes all my tests nicely!
Now some thoughts/comments (I'm coming from the x86 side of things!):
id/enable-tracking: There are a lot of different id/enabled tracking
with corresponding locks, where there's IMO overlap with what the
matrix provides.
Let's start with struct imsic_priv:
| /* Dummy HW interrupt numbers */
| unsigned int nr_hwirqs;
| raw_spinlock_t hwirqs_lock;
| unsigned long *hwirqs_used_bitmap;
These are used to for the domain routing (hwirq -> desc/virq), and not
needed. Just use the same id as virq (at allocation time), and get rid
of these data structures/corresponding functions. The lookup in the
interrupt handler via imsic_local_priv.vectors doesn't care about
hwirq. This is what x86 does... The imsic_vector roughly corresponds
to apic_chip_data (nit: imsic_vector could have the chip_data suffix
as well, at least it would have helped me!)
Moving/affinity changes. The moving of a vector to another CPU
currently involves:
1. Allocate a new vector from the matrix
2. Disable/enable the corresponding per-cpu ids_enabled_bitmap (nested
spinlocks)
3. Trigger two IPIs to apply the bitmap
4. On each CPU target (imsic_local_sync()) loop the bitmap and flip
all bits, and potentially rearm
This seems a bit heavy-weight: Why are you explicitly setting/clearing
all the bits in a loop at the local sync?
x86 does it a bit differently (more lazily): The chip_data has
prev_{cpu,vector}/move_in_progress fields, and keep both vectors
enabled until there's an interrupt on the new vector, and then the old
one is cleaned (irq_complete_move()).
Further; When it's time to remove the old vector, x86 doesn't trigger
an IPI on the disabling side, but queues a cleanup job on a per-cpu
list and triggers a timeout. So, the per-cpu chip_data (per-cpu
"vectors" in your series) can reside in two places during the transit.
I wonder if this clean up is less intrusive, and you just need to
perform what's in the per-list instead of dealing with the
ids_enabled_bitmap? Maybe we can even remove that bitmap as well. The
chip_data/desc has that information. This would mean that
imsic_local_priv() would only have the local vectors (chip_data), and
a cleanup list/timer.
My general comment is that instead of having these global id-tracking
structures, use the matrix together with some desc/chip_data local
data, which should be sufficient.
Random thought: Do we need to explicitly disable (csr) the vector,
when we're changing the affinity? What if we just leave it enabled,
and only when mask/unmask is performed it's actually explicitly masked
(writes to the csr)?
Missing features (which can be added later):
* Reservation mode/activate support (allocate many MSI, but only
request/activate a subset)
* Handle managed interrupts
* There might be some irqd flags are missing, which mostly cpuhp care
about (e.g. irqd_*_single_target())...
Finally; Given that the APLIC requires a lot more patches, depending
on how the review process moves on -- maybe the IMSIC side could go as
a separate series?
Cheers,
Björn
On Tue, Feb 6, 2024 at 9:09 PM Björn Töpel <[email protected]> wrote:
>
> Hi Anup,
>
> Anup Patel <[email protected]> writes:
>
> > The RISC-V AIA specification is ratified as-per the RISC-V international
> > process. The latest ratified AIA specifcation can be found at:
> > https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> >
> > At a high-level, the AIA specification adds three things:
> > 1) AIA CSRs
> > - Improved local interrupt support
> > 2) Incoming Message Signaled Interrupt Controller (IMSIC)
> > - Per-HART MSI controller
> > - Support MSI virtualization
> > - Support IPI along with virtualization
> > 3) Advanced Platform-Level Interrupt Controller (APLIC)
> > - Wired interrupt controller
> > - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
> > - In Direct-mode, injects external interrupts directly into HARTs
> >
> > For an overview of the AIA specification, refer the AIA virtualization
> > talk at KVM Forum 2022:
> > https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> > https://www.youtube.com/watch?v=r071dL8Z0yo
>
> Thank you for continuing to work on this series! I like this
> direction of the series!
>
> TL;DR: I think we can get rid of most of the id/householding data
> structures, except for the irq matrix.
>
> Most of my comments are more of a design/overview nature, so I'll
> comment here in the cover letter.
>
> I took the series for a spin with and it with Alex' ftrace fix it,
> passes all my tests nicely!
>
> Now some thoughts/comments (I'm coming from the x86 side of things!):
>
> id/enable-tracking: There are a lot of different id/enabled tracking
> with corresponding locks, where there's IMO overlap with what the
> matrix provides.
The matrix allocator does not track the enabled/disabled state of
the per-CPU IDs. This is why we have a separate per-CPU
ids_enabled_bitmap which is also used for remote synchronization
across CPUs.
>
> Let's start with struct imsic_priv:
>
> | /* Dummy HW interrupt numbers */
> | unsigned int nr_hwirqs;
> | raw_spinlock_t hwirqs_lock;
> | unsigned long *hwirqs_used_bitmap;
The matrix allocator manages actual IDs for each CPU whereas
the Linux irq_data expects a fixed hwirq which does not change.
Due to this, we have a dummy hwirq space which is always
fixed. The only thing that is changed under-the-hood by the
IMSIC driver is the dummy hwirq to actual HW vector (cpu, id)
mapping.
>
> These are used to for the domain routing (hwirq -> desc/virq), and not
> needed. Just use the same id as virq (at allocation time), and get rid
> of these data structures/corresponding functions. The lookup in the
> interrupt handler via imsic_local_priv.vectors doesn't care about
> hwirq. This is what x86 does... The imsic_vector roughly corresponds
> to apic_chip_data (nit: imsic_vector could have the chip_data suffix
> as well, at least it would have helped me!)
Yes, imsic_vector corresponds to apic_chip_data in the x86 world.
>
> Moving/affinity changes. The moving of a vector to another CPU
> currently involves:
>
> 1. Allocate a new vector from the matrix
> 2. Disable/enable the corresponding per-cpu ids_enabled_bitmap (nested
> spinlocks)
> 3. Trigger two IPIs to apply the bitmap
> 4. On each CPU target (imsic_local_sync()) loop the bitmap and flip
> all bits, and potentially rearm
>
> This seems a bit heavy-weight: Why are you explicitly setting/clearing
> all the bits in a loop at the local sync?
This can be certainly optimized by introducing another
ids_dirty_bitmap. I will add this in the next revision.
>
> x86 does it a bit differently (more lazily): The chip_data has
> prev_{cpu,vector}/move_in_progress fields, and keep both vectors
> enabled until there's an interrupt on the new vector, and then the old
> one is cleaned (irq_complete_move()).
>
> Further; When it's time to remove the old vector, x86 doesn't trigger
> an IPI on the disabling side, but queues a cleanup job on a per-cpu
> list and triggers a timeout. So, the per-cpu chip_data (per-cpu
> "vectors" in your series) can reside in two places during the transit.
We can't avoid IPIs when moving vectors from one CPU to another
CPU because IMSIC id enable/disable is only possible through
CSRs. Also, keep in-mind that irq affinity change might be initiated
on CPU X for some interrupt targeting CPU Y which is then changed
to target CPU Z.
In the case of x86, they have memory mapped registers which
allows one CPU to enable/disable the ID of another CPU.
>
> I wonder if this clean up is less intrusive, and you just need to
> perform what's in the per-list instead of dealing with the
> ids_enabled_bitmap? Maybe we can even remove that bitmap as well. The
> chip_data/desc has that information. This would mean that
> imsic_local_priv() would only have the local vectors (chip_data), and
> a cleanup list/timer.
>
> My general comment is that instead of having these global id-tracking
> structures, use the matrix together with some desc/chip_data local
> data, which should be sufficient.
The "ids_enabled_bitmap", "dummy hwirqs" and private imsic_vectors
are required since the matrix allocator only manages allocation of
per-CPU IDs.
>
> Random thought: Do we need to explicitly disable (csr) the vector,
> when we're changing the affinity? What if we just leave it enabled,
> and only when mask/unmask is performed it's actually explicitly masked
> (writes to the csr)?
We should not leave it enabled because some rough/buggy device
can inject spurious interrupts using MSI writes to unused enabled
interrupts.
>
> Missing features (which can be added later):
> * Reservation mode/activate support (allocate many MSI, but only
> request/activate a subset)
I did not see any PCIe or platform device requiring this kind of
reservation. Any examples ?
> * Handle managed interrupts
Any examples of managed interrupts in the RISC-V world ?
> * There might be some irqd flags are missing, which mostly cpuhp care
> about (e.g. irqd_*_single_target())...
Okay, let me check and update.
>
> Finally; Given that the APLIC requires a lot more patches, depending
> on how the review process moves on -- maybe the IMSIC side could go as
> a separate series?
>
The most popular implementation choice across RISC-V platforms
will be IMSIC + APLIC so both drivers should go together. In fact,
we need both drivers for the QEMU virt machine as well because
UART interrupt (and other wired interrupts) on the QEMU virt
machine goes through APLIC.
Regards,
Anup
Hi!
Anup Patel <[email protected]> writes:
> On Tue, Feb 6, 2024 at 9:09 PM Björn Töpel <[email protected]> wrote:
>>
>> Hi Anup,
>>
>> Anup Patel <[email protected]> writes:
>>
>> > The RISC-V AIA specification is ratified as-per the RISC-V international
>> > process. The latest ratified AIA specifcation can be found at:
>> > https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>> >
>> > At a high-level, the AIA specification adds three things:
>> > 1) AIA CSRs
>> > - Improved local interrupt support
>> > 2) Incoming Message Signaled Interrupt Controller (IMSIC)
>> > - Per-HART MSI controller
>> > - Support MSI virtualization
>> > - Support IPI along with virtualization
>> > 3) Advanced Platform-Level Interrupt Controller (APLIC)
>> > - Wired interrupt controller
>> > - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
>> > - In Direct-mode, injects external interrupts directly into HARTs
>> >
>> > For an overview of the AIA specification, refer the AIA virtualization
>> > talk at KVM Forum 2022:
>> > https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
>> > https://www.youtube.com/watch?v=r071dL8Z0yo
>>
>> Thank you for continuing to work on this series! I like this
>> direction of the series!
>>
>> TL;DR: I think we can get rid of most of the id/householding data
>> structures, except for the irq matrix.
>>
>> Most of my comments are more of a design/overview nature, so I'll
>> comment here in the cover letter.
>>
>> I took the series for a spin with and it with Alex' ftrace fix it,
>> passes all my tests nicely!
>>
>> Now some thoughts/comments (I'm coming from the x86 side of things!):
>>
>> id/enable-tracking: There are a lot of different id/enabled tracking
>> with corresponding locks, where there's IMO overlap with what the
>> matrix provides.
>
> The matrix allocator does not track the enabled/disabled state of
> the per-CPU IDs. This is why we have a separate per-CPU
> ids_enabled_bitmap which is also used for remote synchronization
> across CPUs.
Exactly, but what I'm asking is if that structure is really needed. More
below.
>> Let's start with struct imsic_priv:
>>
>> | /* Dummy HW interrupt numbers */
>> | unsigned int nr_hwirqs;
>> | raw_spinlock_t hwirqs_lock;
>> | unsigned long *hwirqs_used_bitmap;
>
> The matrix allocator manages actual IDs for each CPU whereas
> the Linux irq_data expects a fixed hwirq which does not change.
>
> Due to this, we have a dummy hwirq space which is always
> fixed. The only thing that is changed under-the-hood by the
> IMSIC driver is the dummy hwirq to actual HW vector (cpu, id)
> mapping.
Read below. I'm not talking about local_id from the irq_matrix, I'm
saying use virq, which has the properties you're asking for, and doesn't
require an additional structure. When an irq/desc is allocated, you have
a nice unique number with the virq for the lifetime of the interrupt.
>> These are used to for the domain routing (hwirq -> desc/virq), and not
>> needed. Just use the same id as virq (at allocation time), and get rid
>> of these data structures/corresponding functions. The lookup in the
>> interrupt handler via imsic_local_priv.vectors doesn't care about
>> hwirq. This is what x86 does... The imsic_vector roughly corresponds
>> to apic_chip_data (nit: imsic_vector could have the chip_data suffix
>> as well, at least it would have helped me!)
>
> Yes, imsic_vector corresponds to apic_chip_data in the x86 world.
..and I'm trying to ask the following; Given the IMSIC is pretty much
x86 vector (arch/x86/kernel/apic/vector.c), I'm trying to figure out the
rational why IMSIC has all the extra householding data, not needed by
x86. The x86 has been battle proven, and having to deal with all kind of
quirks (e.g. lost interrupts on affinity changes).
>> Moving/affinity changes. The moving of a vector to another CPU
>> currently involves:
>>
>> 1. Allocate a new vector from the matrix
>> 2. Disable/enable the corresponding per-cpu ids_enabled_bitmap (nested
>> spinlocks)
>> 3. Trigger two IPIs to apply the bitmap
>> 4. On each CPU target (imsic_local_sync()) loop the bitmap and flip
>> all bits, and potentially rearm
>>
>> This seems a bit heavy-weight: Why are you explicitly setting/clearing
>> all the bits in a loop at the local sync?
>
> This can be certainly optimized by introducing another
> ids_dirty_bitmap. I will add this in the next revision.
I rather have fewer maps, and less locks! ;-)
>> x86 does it a bit differently (more lazily): The chip_data has
>> prev_{cpu,vector}/move_in_progress fields, and keep both vectors
>> enabled until there's an interrupt on the new vector, and then the old
>> one is cleaned (irq_complete_move()).
>>
>> Further; When it's time to remove the old vector, x86 doesn't trigger
>> an IPI on the disabling side, but queues a cleanup job on a per-cpu
>> list and triggers a timeout. So, the per-cpu chip_data (per-cpu
>> "vectors" in your series) can reside in two places during the transit.
>
> We can't avoid IPIs when moving vectors from one CPU to another
> CPU because IMSIC id enable/disable is only possible through
> CSRs. Also, keep in-mind that irq affinity change might be initiated
> on CPU X for some interrupt targeting CPU Y which is then changed
> to target CPU Z.
>
> In the case of x86, they have memory mapped registers which
> allows one CPU to enable/disable the ID of another CPU.
Nope. Same mechanics on x86 -- the cleanup has to be done one the
originating core. What I asked was "what about using a timer instead of
an IPI". I think this was up in the last rev as well?
Check out commit bdc1dad299bb ("x86/vector: Replace
IRQ_MOVE_CLEANUP_VECTOR with a timer callback") Specifically, the
comment about lost interrupts, and the rational for keeping the original
target active until there's a new interrupt on the new cpu.
>> I wonder if this clean up is less intrusive, and you just need to
>> perform what's in the per-list instead of dealing with the
>> ids_enabled_bitmap? Maybe we can even remove that bitmap as well. The
>> chip_data/desc has that information. This would mean that
>> imsic_local_priv() would only have the local vectors (chip_data), and
>> a cleanup list/timer.
>>
>> My general comment is that instead of having these global id-tracking
>> structures, use the matrix together with some desc/chip_data local
>> data, which should be sufficient.
>
> The "ids_enabled_bitmap", "dummy hwirqs" and private imsic_vectors
> are required since the matrix allocator only manages allocation of
> per-CPU IDs.
The information in ids_enabled_bitmap is/could be inherent in
imsic_local_priv.vectors (guess what x86 does... ;-)).
Dummy hwirqs could be replaced with the virq.
Hmm, seems like we're talking past each other, or at least I get the
feeling I can't get my opinions out right. I'll try to do a quick PoC,
to show you what I mean. That's probably easier than just talking about
it. ...and maybe I'll come realizing I'm all wrong!
My reaction is -- you're doing a lot of householding with a lot of
locks, and my worry is that we'll just end up with same issues/bloat
that x86 once had (has? ;-)).
>> Random thought: Do we need to explicitly disable (csr) the vector,
>> when we're changing the affinity? What if we just leave it enabled,
>> and only when mask/unmask is performed it's actually explicitly masked
>> (writes to the csr)?
>
> We should not leave it enabled because some rough/buggy device
> can inject spurious interrupts using MSI writes to unused enabled
> interrupts.
OK!
>>
>> Missing features (which can be added later):
>> * Reservation mode/activate support (allocate many MSI, but only
>> request/activate a subset)
>
> I did not see any PCIe or platform device requiring this kind of
> reservation. Any examples ?
It's not a requirement. Some devices allocate a gazillion interrupts
(NICs with many QoS queues, e.g.), but only activate a subset (via
request_irq()). A system using these kind of devices might run out of
interrupts.
Problems you run into once you leave the embedded world, pretty much.
>> * Handle managed interrupts
>
> Any examples of managed interrupts in the RISC-V world ?
E.g. all nvme drives: nvme_setup_irqs(), and I'd assume contemporary
netdev drivers would use it. Typically devices with per-cpu queues.
>> * There might be some irqd flags are missing, which mostly cpuhp care
>> about (e.g. irqd_*_single_target())...
>
> Okay, let me check and update.
I haven't dug much into cpuhp, so I'm out on a limb here...
>> Finally; Given that the APLIC requires a lot more patches, depending
>> on how the review process moves on -- maybe the IMSIC side could go as
>> a separate series?
>>
>
> The most popular implementation choice across RISC-V platforms
> will be IMSIC + APLIC so both drivers should go together. In fact,
> we need both drivers for the QEMU virt machine as well because
> UART interrupt (and other wired interrupts) on the QEMU virt
> machine goes through APLIC.
Thanks for clearing that out! Hmm, an IMSIC only QEMU would be awesome.
Cheers,
Björn
On Wed, Feb 7, 2024 at 12:57 PM Björn Töpel <bjorn@kernelorg> wrote:
>
> Hi!
>
> Anup Patel <[email protected]> writes:
>
> > On Tue, Feb 6, 2024 at 9:09 PM Björn Töpel <[email protected]> wrote:
> >>
> >> Hi Anup,
> >>
> >> Anup Patel <[email protected]> writes:
> >>
> >> > The RISC-V AIA specification is ratified as-per the RISC-V international
> >> > process. The latest ratified AIA specifcation can be found at:
> >> > https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> >> >
> >> > At a high-level, the AIA specification adds three things:
> >> > 1) AIA CSRs
> >> > - Improved local interrupt support
> >> > 2) Incoming Message Signaled Interrupt Controller (IMSIC)
> >> > - Per-HART MSI controller
> >> > - Support MSI virtualization
> >> > - Support IPI along with virtualization
> >> > 3) Advanced Platform-Level Interrupt Controller (APLIC)
> >> > - Wired interrupt controller
> >> > - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
> >> > - In Direct-mode, injects external interrupts directly into HARTs
> >> >
> >> > For an overview of the AIA specification, refer the AIA virtualization
> >> > talk at KVM Forum 2022:
> >> > https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> >> > https://www.youtube.com/watch?v=r071dL8Z0yo
> >>
> >> Thank you for continuing to work on this series! I like this
> >> direction of the series!
> >>
> >> TL;DR: I think we can get rid of most of the id/householding data
> >> structures, except for the irq matrix.
> >>
> >> Most of my comments are more of a design/overview nature, so I'll
> >> comment here in the cover letter.
> >>
> >> I took the series for a spin with and it with Alex' ftrace fix it,
> >> passes all my tests nicely!
> >>
> >> Now some thoughts/comments (I'm coming from the x86 side of things!):
> >>
> >> id/enable-tracking: There are a lot of different id/enabled tracking
> >> with corresponding locks, where there's IMO overlap with what the
> >> matrix provides.
> >
> > The matrix allocator does not track the enabled/disabled state of
> > the per-CPU IDs. This is why we have a separate per-CPU
> > ids_enabled_bitmap which is also used for remote synchronization
> > across CPUs.
>
> Exactly, but what I'm asking is if that structure is really needed. More
> below.
>
> >> Let's start with struct imsic_priv:
> >>
> >> | /* Dummy HW interrupt numbers */
> >> | unsigned int nr_hwirqs;
> >> | raw_spinlock_t hwirqs_lock;
> >> | unsigned long *hwirqs_used_bitmap;
> >
> > The matrix allocator manages actual IDs for each CPU whereas
> > the Linux irq_data expects a fixed hwirq which does not change.
> >
> > Due to this, we have a dummy hwirq space which is always
> > fixed. The only thing that is changed under-the-hood by the
> > IMSIC driver is the dummy hwirq to actual HW vector (cpu, id)
> > mapping.
>
> Read below. I'm not talking about local_id from the irq_matrix, I'm
> saying use virq, which has the properties you're asking for, and doesn't
> require an additional structure. When an irq/desc is allocated, you have
> a nice unique number with the virq for the lifetime of the interrupt.
Sure, let me explore using virq in-place of hwirq.
>
> >> These are used to for the domain routing (hwirq -> desc/virq), and not
> >> needed. Just use the same id as virq (at allocation time), and get rid
> >> of these data structures/corresponding functions. The lookup in the
> >> interrupt handler via imsic_local_priv.vectors doesn't care about
> >> hwirq. This is what x86 does... The imsic_vector roughly corresponds
> >> to apic_chip_data (nit: imsic_vector could have the chip_data suffix
> >> as well, at least it would have helped me!)
> >
> > Yes, imsic_vector corresponds to apic_chip_data in the x86 world.
>
> ...and I'm trying to ask the following; Given the IMSIC is pretty much
> x86 vector (arch/x86/kernel/apic/vector.c), I'm trying to figure out the
> rational why IMSIC has all the extra householding data, not needed by
> x86. The x86 has been battle proven, and having to deal with all kind of
> quirks (e.g. lost interrupts on affinity changes).
Understood.
>
> >> Moving/affinity changes. The moving of a vector to another CPU
> >> currently involves:
> >>
> >> 1. Allocate a new vector from the matrix
> >> 2. Disable/enable the corresponding per-cpu ids_enabled_bitmap (nested
> >> spinlocks)
> >> 3. Trigger two IPIs to apply the bitmap
> >> 4. On each CPU target (imsic_local_sync()) loop the bitmap and flip
> >> all bits, and potentially rearm
> >>
> >> This seems a bit heavy-weight: Why are you explicitly setting/clearing
> >> all the bits in a loop at the local sync?
> >
> > This can be certainly optimized by introducing another
> > ids_dirty_bitmap. I will add this in the next revision.
>
> I rather have fewer maps, and less locks! ;-)
>
> >> x86 does it a bit differently (more lazily): The chip_data has
> >> prev_{cpu,vector}/move_in_progress fields, and keep both vectors
> >> enabled until there's an interrupt on the new vector, and then the old
> >> one is cleaned (irq_complete_move()).
> >>
> >> Further; When it's time to remove the old vector, x86 doesn't trigger
> >> an IPI on the disabling side, but queues a cleanup job on a per-cpu
> >> list and triggers a timeout. So, the per-cpu chip_data (per-cpu
> >> "vectors" in your series) can reside in two places during the transit.
> >
> > We can't avoid IPIs when moving vectors from one CPU to another
> > CPU because IMSIC id enable/disable is only possible through
> > CSRs. Also, keep in-mind that irq affinity change might be initiated
> > on CPU X for some interrupt targeting CPU Y which is then changed
> > to target CPU Z.
> >
> > In the case of x86, they have memory mapped registers which
> > allows one CPU to enable/disable the ID of another CPU.
>
> Nope. Same mechanics on x86 -- the cleanup has to be done one the
> originating core. What I asked was "what about using a timer instead of
> an IPI". I think this was up in the last rev as well?
>
> Check out commit bdc1dad299bb ("x86/vector: Replace
> IRQ_MOVE_CLEANUP_VECTOR with a timer callback") Specifically, the
> comment about lost interrupts, and the rational for keeping the original
> target active until there's a new interrupt on the new cpu.
Trying timer interrupt is still TBD on my side because with v12
my goal was to implement per-device MSI domains. Let me
explore timer interrupts for v13.
>
> >> I wonder if this clean up is less intrusive, and you just need to
> >> perform what's in the per-list instead of dealing with the
> >> ids_enabled_bitmap? Maybe we can even remove that bitmap as well. The
> >> chip_data/desc has that information. This would mean that
> >> imsic_local_priv() would only have the local vectors (chip_data), and
> >> a cleanup list/timer.
> >>
> >> My general comment is that instead of having these global id-tracking
> >> structures, use the matrix together with some desc/chip_data local
> >> data, which should be sufficient.
> >
> > The "ids_enabled_bitmap", "dummy hwirqs" and private imsic_vectors
> > are required since the matrix allocator only manages allocation of
> > per-CPU IDs.
>
> The information in ids_enabled_bitmap is/could be inherent in
> imsic_local_priv.vectors (guess what x86 does... ;-)).
>
> Dummy hwirqs could be replaced with the virq.
>
> Hmm, seems like we're talking past each other, or at least I get the
> feeling I can't get my opinions out right. I'll try to do a quick PoC,
> to show you what I mean. That's probably easier than just talking about
> it. ...and maybe I'll come realizing I'm all wrong!
I suggest to wait for my v13 and try something on top of that
otherwise we might duplicate efforts.
>
> My reaction is -- you're doing a lot of householding with a lot of
> locks, and my worry is that we'll just end up with same issues/bloat
> that x86 once had (has? ;-)).
>
> >> Random thought: Do we need to explicitly disable (csr) the vector,
> >> when we're changing the affinity? What if we just leave it enabled,
> >> and only when mask/unmask is performed it's actually explicitly masked
> >> (writes to the csr)?
> >
> > We should not leave it enabled because some rough/buggy device
> > can inject spurious interrupts using MSI writes to unused enabled
> > interrupts.
>
> OK!
>
> >>
> >> Missing features (which can be added later):
> >> * Reservation mode/activate support (allocate many MSI, but only
> >> request/activate a subset)
> >
> > I did not see any PCIe or platform device requiring this kind of
> > reservation. Any examples ?
>
> It's not a requirement. Some devices allocate a gazillion interrupts
> (NICs with many QoS queues, e.g.), but only activate a subset (via
> request_irq()). A system using these kind of devices might run out of
> interrupts.
I don't see how this is not possible currently.
>
> Problems you run into once you leave the embedded world, pretty much.
>
> >> * Handle managed interrupts
> >
> > Any examples of managed interrupts in the RISC-V world ?
>
> E.g. all nvme drives: nvme_setup_irqs(), and I'd assume contemporary
> netdev drivers would use it. Typically devices with per-cpu queues.
We have tested with NVMe devices, e1000e, VirtIO-net, etc and I did
not see any issue.
We can always add new features as separate incremental series as long
as there is clear use-cause backed by real-world devices.
>
> >> * There might be some irqd flags are missing, which mostly cpuhp care
> >> about (e.g. irqd_*_single_target())...
> >
> > Okay, let me check and update.
>
> I haven't dug much into cpuhp, so I'm out on a limb here...
>
> >> Finally; Given that the APLIC requires a lot more patches, depending
> >> on how the review process moves on -- maybe the IMSIC side could go as
> >> a separate series?
> >>
> >
> > The most popular implementation choice across RISC-V platforms
> > will be IMSIC + APLIC so both drivers should go together. In fact,
> > we need both drivers for the QEMU virt machine as well because
> > UART interrupt (and other wired interrupts) on the QEMU virt
> > machine goes through APLIC.
>
> Thanks for clearing that out! Hmm, an IMSIC only QEMU would be awesome.
>
>
> Cheers,
> Björn
Regards,
Anup
Anup Patel <[email protected]> writes:
>> Nope. Same mechanics on x86 -- the cleanup has to be done one the
>> originating core. What I asked was "what about using a timer instead of
>> an IPI". I think this was up in the last rev as well?
>>
>> Check out commit bdc1dad299bb ("x86/vector: Replace
>> IRQ_MOVE_CLEANUP_VECTOR with a timer callback") Specifically, the
>> comment about lost interrupts, and the rational for keeping the original
>> target active until there's a new interrupt on the new cpu.
>
> Trying timer interrupt is still TBD on my side because with v12
> my goal was to implement per-device MSI domains. Let me
> explore timer interrupts for v13.
OK!
>> >> I wonder if this clean up is less intrusive, and you just need to
>> >> perform what's in the per-list instead of dealing with the
>> >> ids_enabled_bitmap? Maybe we can even remove that bitmap as well. The
>> >> chip_data/desc has that information. This would mean that
>> >> imsic_local_priv() would only have the local vectors (chip_data), and
>> >> a cleanup list/timer.
>> >>
>> >> My general comment is that instead of having these global id-tracking
>> >> structures, use the matrix together with some desc/chip_data local
>> >> data, which should be sufficient.
>> >
>> > The "ids_enabled_bitmap", "dummy hwirqs" and private imsic_vectors
>> > are required since the matrix allocator only manages allocation of
>> > per-CPU IDs.
>>
>> The information in ids_enabled_bitmap is/could be inherent in
>> imsic_local_priv.vectors (guess what x86 does... ;-)).
>>
>> Dummy hwirqs could be replaced with the virq.
>>
>> Hmm, seems like we're talking past each other, or at least I get the
>> feeling I can't get my opinions out right. I'll try to do a quick PoC,
>> to show you what I mean. That's probably easier than just talking about
>> it. ...and maybe I'll come realizing I'm all wrong!
>
> I suggest to wait for my v13 and try something on top of that
> otherwise we might duplicate efforts.
OK!
>> > I did not see any PCIe or platform device requiring this kind of
>> > reservation. Any examples ?
>>
>> It's not a requirement. Some devices allocate a gazillion interrupts
>> (NICs with many QoS queues, e.g.), but only activate a subset (via
>> request_irq()). A system using these kind of devices might run out of
>> interrupts.
>
> I don't see how this is not possible currently.
Again, this is something we can improve on later. But, this
implementation activates the interrupt at allocation time, no?
>> Problems you run into once you leave the embedded world, pretty much.
>>
>> >> * Handle managed interrupts
>> >
>> > Any examples of managed interrupts in the RISC-V world ?
>>
>> E.g. all nvme drives: nvme_setup_irqs(), and I'd assume contemporary
>> netdev drivers would use it. Typically devices with per-cpu queues.
>
> We have tested with NVMe devices, e1000e, VirtIO-net, etc and I did
> not see any issue.
>
> We can always add new features as separate incremental series as long
> as there is clear use-cause backed by real-world devices.
Agreed. Let's not feature creep.
Björn
Anup Patel <[email protected]> writes:
> The RISC-V advanced interrupt architecture (AIA) specification
> defines a new MSI controller called incoming message signalled
> interrupt controller (IMSIC) which manages MSI on per-HART (or
> per-CPU) basis. It also supports IPIs as software injected MSIs.
> (For more details refer https://github.com/riscv/riscv-aia)
>
> Let us add an early irqchip driver for RISC-V IMSIC which sets
> up the IMSIC state and provide IPIs.
>
> Signed-off-by: Anup Patel <[email protected]>
> ---
> drivers/irqchip/Kconfig | 7 +
> drivers/irqchip/Makefile | 1 +
> drivers/irqchip/irq-riscv-imsic-early.c | 241 +++++++
> drivers/irqchip/irq-riscv-imsic-state.c | 887 ++++++++++++++++++++++++
> drivers/irqchip/irq-riscv-imsic-state.h | 105 +++
> include/linux/irqchip/riscv-imsic.h | 87 +++
> 6 files changed, 1328 insertions(+)
> create mode 100644 drivers/irqchip/irq-riscv-imsic-early.c
> create mode 100644 drivers/irqchip/irq-riscv-imsic-state.c
> create mode 100644 drivers/irqchip/irq-riscv-imsic-state.h
> create mode 100644 include/linux/irqchip/riscv-imsic.h
>
> diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
> index f7149d0f3d45..85f86e31c996 100644
> --- a/drivers/irqchip/Kconfig
> +++ b/drivers/irqchip/Kconfig
> @@ -546,6 +546,13 @@ config SIFIVE_PLIC
> select IRQ_DOMAIN_HIERARCHY
> select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP
>
> +config RISCV_IMSIC
> + bool
> + depends on RISCV
> + select IRQ_DOMAIN_HIERARCHY
> + select GENERIC_IRQ_MATRIX_ALLOCATOR
> + select GENERIC_MSI_IRQ
> +
> config EXYNOS_IRQ_COMBINER
> bool "Samsung Exynos IRQ combiner support" if COMPILE_TEST
> depends on (ARCH_EXYNOS && ARM) || COMPILE_TEST
> diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
> index ffd945fe71aa..d714724387ce 100644
> --- a/drivers/irqchip/Makefile
> +++ b/drivers/irqchip/Makefile
> @@ -95,6 +95,7 @@ obj-$(CONFIG_QCOM_MPM) += irq-qcom-mpm.o
> obj-$(CONFIG_CSKY_MPINTC) += irq-csky-mpintc.o
> obj-$(CONFIG_CSKY_APB_INTC) += irq-csky-apb-intc.o
> obj-$(CONFIG_RISCV_INTC) += irq-riscv-intc.o
> +obj-$(CONFIG_RISCV_IMSIC) += irq-riscv-imsic-state.o irq-riscv-imsic-early.o
> obj-$(CONFIG_SIFIVE_PLIC) += irq-sifive-plic.o
> obj-$(CONFIG_IMX_IRQSTEER) += irq-imx-irqsteer.o
> obj-$(CONFIG_IMX_INTMUX) += irq-imx-intmux.o
> diff --git a/drivers/irqchip/irq-riscv-imsic-early.c b/drivers/irqchip/irq-riscv-imsic-early.c
> new file mode 100644
> index 000000000000..3557e32a713c
> --- /dev/null
> +++ b/drivers/irqchip/irq-riscv-imsic-early.c
> @@ -0,0 +1,241 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2021 Western Digital Corporation or its affiliates.
> + * Copyright (C) 2022 Ventana Micro Systems Inc.
> + */
> +
> +#define pr_fmt(fmt) "riscv-imsic: " fmt
> +#include <linux/cpu.h>
> +#include <linux/interrupt.h>
> +#include <linux/io.h>
> +#include <linux/irq.h>
> +#include <linux/irqchip.h>
> +#include <linux/irqchip/chained_irq.h>
> +#include <linux/module.h>
> +#include <linux/spinlock.h>
> +#include <linux/smp.h>
> +
> +#include "irq-riscv-imsic-state.h"
> +
> +static int imsic_parent_irq;
> +
> +#ifdef CONFIG_SMP
> +static irqreturn_t imsic_local_sync_handler(int irq, void *data)
> +{
> + imsic_local_sync();
> + return IRQ_HANDLED;
> +}
> +
> +static void imsic_ipi_send(unsigned int cpu)
> +{
> + struct imsic_local_config *local =
> + per_cpu_ptr(imsic->global.local, cpu);
> +
> + writel_relaxed(IMSIC_IPI_ID, local->msi_va);
> +}
> +
> +static void imsic_ipi_starting_cpu(void)
> +{
> + /* Enable IPIs for current CPU. */
> + __imsic_id_set_enable(IMSIC_IPI_ID);
> +
> + /* Enable virtual IPI used for IMSIC ID synchronization */
> + enable_percpu_irq(imsic->ipi_virq, 0);
> +}
> +
> +static void imsic_ipi_dying_cpu(void)
> +{
> + /*
> + * Disable virtual IPI used for IMSIC ID synchronization so
> + * that we don't receive ID synchronization requests.
> + */
> + disable_percpu_irq(imsic->ipi_virq);
> +}
> +
> +static int __init imsic_ipi_domain_init(void)
> +{
> + int virq;
> +
> + /* Create IMSIC IPI multiplexing */
> + virq = ipi_mux_create(IMSIC_NR_IPI, imsic_ipi_send);
> + if (virq <= 0)
> + return (virq < 0) ? virq : -ENOMEM;
> + imsic->ipi_virq = virq;
> +
> + /* First vIRQ is used for IMSIC ID synchronization */
> + virq = request_percpu_irq(imsic->ipi_virq, imsic_local_sync_handler,
> + "riscv-imsic-lsync", imsic->global.local);
There's a lot of boilerplate for the local-sync IPI. Any reason not to
use what the kernel provides out-of-the-box:
int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
int wait);
e.g.
smp_call_function_single(target_cpu, imsic_local_sync_with_new_signature, NULL, 0);
Björn
Björn Töpel <[email protected]> writes:
>>> Hmm, seems like we're talking past each other, or at least I get the
>>> feeling I can't get my opinions out right. I'll try to do a quick PoC,
>>> to show you what I mean. That's probably easier than just talking about
>>> it. ...and maybe I'll come realizing I'm all wrong!
>>
>> I suggest to wait for my v13 and try something on top of that
>> otherwise we might duplicate efforts.
>
> OK!
I did some really rough code sketching, and I'm confident that you can
get rid of all ids_enabled_bitmap, hwirqs_used_bitmap, and the
corresponding functions/locks. I'd say one lock is enough, and the key
is having the per-cpu imsic_local_priv.vectors change from struct
imsic_vector * to struct imsic_vector **.
Using smp_call_function_single() to IPI enable (and disable if you don't
want to use the lazy timer disable mechanism) seems feasible as well!
(Let me know if you don't have the spare cycles, and I can help out.)
Björn
On Wed, Feb 7, 2024 at 6:25 PM Björn Töpel <[email protected]> wrote:
>
> Björn Töpel <[email protected]> writes:
>
> >>> Hmm, seems like we're talking past each other, or at least I get the
> >>> feeling I can't get my opinions out right. I'll try to do a quick PoC,
> >>> to show you what I mean. That's probably easier than just talking about
> >>> it. ...and maybe I'll come realizing I'm all wrong!
> >>
> >> I suggest to wait for my v13 and try something on top of that
> >> otherwise we might duplicate efforts.
> >
> > OK!
>
> I did some really rough code sketching, and I'm confident that you can
> get rid of all ids_enabled_bitmap, hwirqs_used_bitmap, and the
> corresponding functions/locks. I'd say one lock is enough, and the key
> is having the per-cpu imsic_local_priv.vectors change from struct
> imsic_vector * to struct imsic_vector **.
I have managed to remove hwirqs_bitmap (and related function).
Now, I am trying another approach to simplify locking using atomics.
>
> Using smp_call_function_single() to IPI enable (and disable if you don't
> want to use the lazy timer disable mechanism) seems feasible as well!
We have intentionally kept separate virq for synchronization because
this allows us to gather stats for debugging. Calling smp_call_function_single()
will not allow us to separately gather stats for sync IPIs.
The smp_call_function_single() eventually leads to __ipi_send_mask()
via send_ipi_mask() in arch/riscv so directly calling __ipi_send_mask()
for sync IPI is faster.
>
> (Let me know if you don't have the spare cycles, and I can help out.)
>
>
> Björn
Regards,
Anup
On Wed, Feb 7, 2024 at 6:25 PM Björn Töpel <[email protected]> wrote:
>
> Björn Töpel <[email protected]> writes:
>
> >>> Hmm, seems like we're talking past each other, or at least I get the
> >>> feeling I can't get my opinions out right. I'll try to do a quick PoC,
> >>> to show you what I mean. That's probably easier than just talking about
> >>> it. ...and maybe I'll come realizing I'm all wrong!
> >>
> >> I suggest to wait for my v13 and try something on top of that
> >> otherwise we might duplicate efforts.
> >
> > OK!
>
> I did some really rough code sketching, and I'm confident that you can
> get rid of all ids_enabled_bitmap, hwirqs_used_bitmap, and the
> corresponding functions/locks. I'd say one lock is enough, and the key
> is having the per-cpu imsic_local_priv.vectors change from struct
> imsic_vector * to struct imsic_vector **.
>
> Using smp_call_function_single() to IPI enable (and disable if you don't
> want to use the lazy timer disable mechanism) seems feasible as well!
>
> (Let me know if you don't have the spare cycles, and I can help out.)
If you can help upstream IOMMU driver then that would be awesome.
Regards,
Anup
Hi Anup,
I understand that some refactoring is in progress, but I don't see the
report below; adding it here hoping that it can be useful towards v13.
(Unfortunately, I didn't have enough time to debug this yet...)
> irqchip/sifive-plic: Convert PLIC driver into a platform driver
I'm seeing the following LOCKDEP warning with this series, bisected to
the commit above. This is a defconfig + PROVE_LOCKING=y build, booted
using -machine virt,aia=none.
[ 0.953473] ========================================================
[ 0.953704] WARNING: possible irq lock inversion dependency detected
[ 0.953955] 6.8.0-rc1-00039-gd9b9d6eb987f #1122 Not tainted
[ 0.954224] --------------------------------------------------------
[ 0.954444] swapper/0/0 just changed the state of lock:
[ 0.954664] ffffaf808109d0c8 (&irq_desc_lock_class){-...}-{2:2}, at: handle_fasteoi_irq+0x24/0x1da
[ 0.955699] but this lock took another, HARDIRQ-unsafe lock in the past:
[ 0.955942] (&handler->enable_lock){+.+.}-{2:2}
[ 0.955974]
and interrupts could create inverse lock ordering between them.
[ 0.956507]
other info that might help us debug this:
[ 0.956775] Possible interrupt unsafe locking scenario:
[ 0.956998] CPU0 CPU1
[ 0.957247] ---- ----
[ 0.957439] lock(&handler->enable_lock);
[ 0.957607] local_irq_disable();
[ 0.957793] lock(&irq_desc_lock_class);
[ 0.958021] lock(&handler->enable_lock);
[ 0.958246] <Interrupt>
[ 0.958342] lock(&irq_desc_lock_class);
[ 0.958501]
*** DEADLOCK ***
[ 0.958715] no locks held by swapper/0/0.
[ 0.958870]
the shortest dependencies between 2nd lock and 1st lock:
[ 0.959152] -> (&handler->enable_lock){+.+.}-{2:2} {
[ 0.959372] HARDIRQ-ON-W at:
[ 0.959522] __lock_acquire+0x884/0x1f5c
[ 0.959745] lock_acquire+0xf0/0x292
[ 0.959913] _raw_spin_lock+0x2c/0x40
[ 0.960090] plic_probe+0x322/0x65c
[ 0.960257] platform_probe+0x4e/0x92
[ 0.960432] really_probe+0x82/0x210
[ 0.960598] __driver_probe_device+0x5c/0xd0
[ 0.960784] driver_probe_device+0x2c/0xb0
[ 0.960964] __driver_attach+0x72/0x10a
[ 0.961151] bus_for_each_dev+0x60/0xac
[ 0.961330] driver_attach+0x1a/0x22
[ 0.961496] bus_add_driver+0xd4/0x19e
[ 0.961666] driver_register+0x3e/0xd8
[ 0.961835] __platform_driver_register+0x1c/0x24
[ 0.962030] plic_driver_init+0x1a/0x22
[ 0.962201] do_one_initcall+0x80/0x268
[ 0.962371] kernel_init_freeable+0x296/0x300
[ 0.962554] kernel_init+0x1e/0x10a
[ 0.962713] ret_from_fork+0xe/0x1c
[ 0.962884] SOFTIRQ-ON-W at:
[ 0.962994] __lock_acquire+0x89e/0x1f5c
[ 0.963169] lock_acquire+0xf0/0x292
[ 0.963336] _raw_spin_lock+0x2c/0x40
[ 0.963497] plic_probe+0x322/0x65c
[ 0.963664] platform_probe+0x4e/0x92
[ 0.963849] really_probe+0x82/0x210
[ 0.964054] __driver_probe_device+0x5c/0xd0
[ 0.964255] driver_probe_device+0x2c/0xb0
[ 0.964428] __driver_attach+0x72/0x10a
[ 0.964603] bus_for_each_dev+0x60/0xac
[ 0.964777] driver_attach+0x1a/0x22
[ 0.964943] bus_add_driver+0xd4/0x19e
[ 0.965343] driver_register+0x3e/0xd8
[ 0.965527] __platform_driver_register+0x1c/0x24
[ 0.965732] plic_driver_init+0x1a/0x22
[ 0.965908] do_one_initcall+0x80/0x268
[ 0.966078] kernel_init_freeable+0x296/0x300
[ 0.966268] kernel_init+0x1e/0x10a
[ 0.966436] ret_from_fork+0xe/0x1c
[ 0.966599] INITIAL USE at:
[ 0.966716] __lock_acquire+0x3fc/0x1f5c
[ 0.966891] lock_acquire+0xf0/0x292
[ 0.967048] _raw_spin_lock+0x2c/0x40
[ 0.967206] plic_probe+0x322/0x65c
[ 0.967360] platform_probe+0x4e/0x92
[ 0.967522] really_probe+0x82/0x210
[ 0.967678] __driver_probe_device+0x5c/0xd0
[ 0.967853] driver_probe_device+0x2c/0xb0
[ 0.968025] __driver_attach+0x72/0x10a
[ 0.968185] bus_for_each_dev+0x60/0xac
[ 0.968348] driver_attach+0x1a/0x22
[ 0.968513] bus_add_driver+0xd4/0x19e
[ 0.968678] driver_register+0x3e/0xd8
[ 0.968839] __platform_driver_register+0x1c/0x24
[ 0.969035] plic_driver_init+0x1a/0x22
[ 0.969239] do_one_initcall+0x80/0x268
[ 0.969431] kernel_init_freeable+0x296/0x300
[ 0.969610] kernel_init+0x1e/0x10a
[ 0.969766] ret_from_fork+0xe/0x1c
[ 0.969936] }
[ 0.970010] ... key at: [<ffffffff824f4138>] __key.2+0x0/0x10
[ 0.970224] ... acquired at:
[ 0.970353] lock_acquire+0xf0/0x292
[ 0.970482] _raw_spin_lock+0x2c/0x40
[ 0.970609] plic_irq_enable+0x7e/0x140
[ 0.970739] irq_enable+0x2c/0x60
[ 0.970882] __irq_startup+0x58/0x60
[ 0.971008] irq_startup+0x5e/0x13c
[ 0.971126] __setup_irq+0x4de/0x5da
[ 0.971248] request_threaded_irq+0xcc/0x12e
[ 0.971394] vm_find_vqs+0x62/0x50a
[ 0.971518] probe_common+0xfe/0x1d2
[ 0.971635] virtrng_probe+0xc/0x14
[ 0.971751] virtio_dev_probe+0x154/0x1fc
[ 0.971878] really_probe+0x82/0x210
[ 0.972008] __driver_probe_device+0x5c/0xd0
[ 0.972147] driver_probe_device+0x2c/0xb0
[ 0.972280] __driver_attach+0x72/0x10a
[ 0.972407] bus_for_each_dev+0x60/0xac
[ 0.972540] driver_attach+0x1a/0x22
[ 0.972656] bus_add_driver+0xd4/0x19e
[ 0.972777] driver_register+0x3e/0xd8
[ 0.972896] register_virtio_driver+0x1c/0x2a
[ 0.973049] virtio_rng_driver_init+0x18/0x20
[ 0.973236] do_one_initcall+0x80/0x268
[ 0.973399] kernel_init_freeable+0x296/0x300
[ 0.973540] kernel_init+0x1e/0x10a
[ 0.973658] ret_from_fork+0xe/0x1c
[ 0.973858] -> (&irq_desc_lock_class){-...}-{2:2} {
[ 0.974036] IN-HARDIRQ-W at:
[ 0.974142] __lock_acquire+0xa82/0x1f5c
[ 0.974309] lock_acquire+0xf0/0x292
[ 0.974467] _raw_spin_lock+0x2c/0x40
[ 0.974625] handle_fasteoi_irq+0x24/0x1da
[ 0.974794] generic_handle_domain_irq+0x1c/0x2a
[ 0.974982] plic_handle_irq+0x7e/0xf0
[ 0.975143] generic_handle_domain_irq+0x1c/0x2a
[ 0.975329] riscv_intc_irq+0x2e/0x46
[ 0.975488] handle_riscv_irq+0x4a/0x74
[ 0.975652] call_on_irq_stack+0x32/0x40
[ 0.975817] INITIAL USE at:
[ 0.975923] __lock_acquire+0x3fc/0x1f5c
[ 0.976087] lock_acquire+0xf0/0x292
[ 0.976244] _raw_spin_lock_irqsave+0x3a/0x64
[ 0.976423] __irq_get_desc_lock+0x5a/0x84
[ 0.976594] irq_modify_status+0x2a/0x106
[ 0.976764] irq_set_percpu_devid+0x62/0x78
[ 0.976939] riscv_intc_domain_map+0x1e/0x54
[ 0.977133] irq_domain_associate_locked+0x42/0xe4
[ 0.977363] irq_create_mapping_affinity+0x98/0xd4
[ 0.977570] sbi_ipi_init+0x70/0x142
[ 0.977744] init_IRQ+0xfe/0x11a
[ 0.977906] start_kernel+0x4ae/0x790
[ 0.978082] }
[ 0.978151] ... key at: [<ffffffff824bbee0>] irq_desc_lock_class+0x0/0x10
[ 0.978389] ... acquired at:
[ 0.978494] mark_lock+0x3fe/0x8c2
[ 0.978624] __lock_acquire+0xa82/0x1f5c
[ 0.978766] lock_acquire+0xf0/0x292
[ 0.978897] _raw_spin_lock+0x2c/0x40
[ 0.979029] handle_fasteoi_irq+0x24/0x1da
[ 0.979171] generic_handle_domain_irq+0x1c/0x2a
[ 0.979326] plic_handle_irq+0x7e/0xf0
[ 0.979460] generic_handle_domain_irq+0x1c/0x2a
[ 0.979618] riscv_intc_irq+0x2e/0x46
[ 0.979751] handle_riscv_irq+0x4a/0x74
[ 0.979888] call_on_irq_stack+0x32/0x40
[ 0.980110]
stack backtrace:
[ 0.980358] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc1-00039-gd9b9d6eb987f #1122
[ 0.980662] Hardware name: riscv-virtio,qemu (DT)
[ 0.980913] Call Trace:
[ 0.981042] [<ffffffff80007198>] dump_backtrace+0x1c/0x24
[ 0.981246] [<ffffffff80ae020a>] show_stack+0x2c/0x38
[ 0.981456] [<ffffffff80aedac4>] dump_stack_lvl+0x5a/0x7c
[ 0.981648] [<ffffffff80aedafa>] dump_stack+0x14/0x1c
[ 0.981813] [<ffffffff80ae17a4>] print_irq_inversion_bug.part.0+0x162/0x176
[ 0.982031] [<ffffffff8007c6e6>] mark_lock+0x3fe/0x8c2
[ 0.982198] [<ffffffff8007d888>] __lock_acquire+0xa82/0x1f5c
[ 0.982377] [<ffffffff8007f59e>] lock_acquire+0xf0/0x292
[ 0.982549] [<ffffffff80af9962>] _raw_spin_lock+0x2c/0x40
[ 0.982721] [<ffffffff8008f3fe>] handle_fasteoi_irq+0x24/0x1da
[ 0.982904] [<ffffffff8008a4a4>] generic_handle_domain_irq+0x1c/0x2a
[ 0.983112] [<ffffffff80581dc0>] plic_handle_irq+0x7e/0xf0
[ 0.983293] [<ffffffff8008a4a4>] generic_handle_domain_irq+0x1c/0x2a
[ 0.983495] [<ffffffff8057fb1a>] riscv_intc_irq+0x2e/0x46
[ 0.983671] [<ffffffff80aedb4c>] handle_riscv_irq+0x4a/0x74
[ 0.983856] [<ffffffff80afa756>] call_on_irq_stack+0x32/0x40
When I switch to -machine virt,aia=aplic-imsic (same config as above), I
get:
[ 0.971406] ============================================
[ 0.971439] WARNING: possible recursive locking detected
[ 0.971497] 6.8.0-rc1-00039-gd9b9d6eb987f #1122 Not tainted
[ 0.971583] --------------------------------------------
[ 0.971612] swapper/0/1 is trying to acquire lock:
[ 0.971662] ffffaf83fefa8e78 (&lpriv->ids_lock){-...}-{2:2}, at: imsic_vector_move+0x92/0x146
[ 0.971927]
but task is already holding lock:
[ 0.971975] ffffaf83fef6ee78 (&lpriv->ids_lock){-...}-{2:2}, at: imsic_vector_move+0x86/0x146
[ 0.972045]
other info that might help us debug this:
[ 0.972085] Possible unsafe locking scenario:
[ 0.972114] CPU0
[ 0.972133] ----
[ 0.972153] lock(&lpriv->ids_lock);
[ 0.972191] lock(&lpriv->ids_lock);
[ 0.972228]
*** DEADLOCK ***
[ 0.972258] May be due to missing lock nesting notation
[ 0.972306] 6 locks held by swapper/0/1:
[ 0.972338] #0: ffffaf8081f65970 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x6a/0x10a
[ 0.972413] #1: ffffaf808217c240 (&desc->request_mutex){+.+.}-{3:3}, at: __setup_irq+0xa2/0x5da
[ 0.972492] #2: ffffaf808217c0c8 (&irq_desc_lock_class){....}-{2:2}, at: __setup_irq+0xbe/0x5da
[ 0.972555] #3: ffffffff81892ac0 (mask_lock){....}-{2:2}, at: irq_setup_affinity+0x38/0xc6
[ 0.972617] #4: ffffffff81892a80 (tmp_mask_lock){....}-{2:2}, at: irq_do_set_affinity+0x3a/0x164
[ 0.972681] #5: ffffaf83fef6ee78 (&lpriv->ids_lock){-...}-{2:2}, at: imsic_vector_move+0x86/0x146
[ 0.972753]
stack backtrace:
[ 0.972852] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc1-00039-gd9b9d6eb987f #1122
[ 0.972900] Hardware name: riscv-virtio,qemu (DT)
[ 0.972987] Call Trace:
[ 0.973019] [<ffffffff80007198>] dump_backtrace+0x1c/0x24
[ 0.973054] [<ffffffff80ae020a>] show_stack+0x2c/0x38
[ 0.973083] [<ffffffff80aedac4>] dump_stack_lvl+0x5a/0x7c
[ 0.973112] [<ffffffff80aedafa>] dump_stack+0x14/0x1c
[ 0.973139] [<ffffffff8007ad5e>] print_deadlock_bug+0x282/0x328
[ 0.973168] [<ffffffff8007e15c>] __lock_acquire+0x1356/0x1f5c
[ 0.973198] [<ffffffff8007f59e>] lock_acquire+0xf0/0x292
[ 0.973225] [<ffffffff80af9adc>] _raw_spin_lock_irqsave+0x3a/0x64
[ 0.973255] [<ffffffff80581210>] imsic_vector_move+0x92/0x146
[ 0.973285] [<ffffffff80581a04>] imsic_irq_set_affinity+0x8e/0xc6
[ 0.973315] [<ffffffff8008c86a>] irq_do_set_affinity+0x142/0x164
[ 0.973345] [<ffffffff8008cc22>] irq_setup_affinity+0x68/0xc6
[ 0.973374] [<ffffffff8008fa82>] irq_startup+0x72/0x13c
[ 0.973401] [<ffffffff8008d40c>] __setup_irq+0x4de/0x5da
[ 0.973430] [<ffffffff8008d5d4>] request_threaded_irq+0xcc/0x12e
[ 0.973460] [<ffffffff806346d8>] vp_find_vqs_msix+0x114/0x376
[ 0.973491] [<ffffffff80634970>] vp_find_vqs+0x36/0x136
[ 0.973518] [<ffffffff80633280>] vp_modern_find_vqs+0x16/0x4e
[ 0.973547] [<ffffffff80ab31f8>] p9_virtio_probe+0x8e/0x31c
[ 0.973576] [<ffffffff8062d982>] virtio_dev_probe+0x154/0x1fc
[ 0.973605] [<ffffffff80693738>] really_probe+0x82/0x210
[ 0.973632] [<ffffffff80693922>] __driver_probe_device+0x5c/0xd0
[ 0.973661] [<ffffffff806939c2>] driver_probe_device+0x2c/0xb0
[ 0.973690] [<ffffffff80693b46>] __driver_attach+0x72/0x10a
[ 0.973718] [<ffffffff8069191a>] bus_for_each_dev+0x60/0xac
[ 0.973746] [<ffffffff80693164>] driver_attach+0x1a/0x22
[ 0.973773] [<ffffffff80692ade>] bus_add_driver+0xd4/0x19e
[ 0.973801] [<ffffffff8069487e>] driver_register+0x3e/0xd8
[ 0.973829] [<ffffffff8062d1ce>] register_virtio_driver+0x1c/0x2a
[ 0.973858] [<ffffffff80c3da52>] p9_virtio_init+0x36/0x56
[ 0.973887] [<ffffffff800028fe>] do_one_initcall+0x80/0x268
[ 0.973915] [<ffffffff80c01144>] kernel_init_freeable+0x296/0x300
[ 0.973944] [<ffffffff80af05dc>] kernel_init+0x1e/0x10a
[ 0.973972] [<ffffffff80afa716>] ret_from_fork+0xe/0x1c
FWIW, the full Qemu command I used was as follows:
sudo /home/andrea/Downloads/qemu/build/qemu-system-riscv64 \
-append "root=/dev/root rw rootfstype=9p rootflags=version=9p2000.L,trans=virtio,cache=mmap,access=any raid=noautodetect security=none loglevel=7" \
-cpu rv64,sv57=off,svadu=off,svnapot=off \
-device virtio-net-device,netdev=net0 \
-device virtio-rng-device,rng=rng0 \
-device virtio-9p-pci,fsdev=root,mount_tag=/dev/root \
-fsdev local,id=root,path=/home/andrea/Downloads/jammy/,security_model=none \
-kernel /home/andrea/linux/arch/riscv/boot/Image \
-m 16G \
-machine virt,aia=<either "none" or "aplic-imsic"> \
-monitor telnet:127.0.0.1:55555,server,nowait \
-netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 \
-nographic \
-object rng-random,filename=/dev/urandom,id=rng0 \
-serial mon:stdio \
-smp 5
Andrea
Anup!
On Sat, Jan 27 2024 at 21:50, Anup Patel wrote:
>> Changes since v11:
>> - Rebased on Linux-6.8-rc1
>> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
>> MSI handling to per device MSI domains" series by Thomas.
>> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
>> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
>> https://lore.kernel.org/linux-arm-kernel/[email protected]/)
>> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
>> - Updated IMSIC driver to support per-device MSI domains for PCI and
>> platform devices.
>
> I have rebased and included 13 patches (which add per-device MSI domain
> infrastructure) from your series [1]. In this series, the IMSIC driver
> implements the msi_parent_ops and APLIC driver implements wired-to-msi
> bridge using your new infrastructure.
>
> The remaining 27 patches of your series [1] requires testing on ARM
> platforms which I don't have. I suggest these remaining patches to
> go as separate series.
Of course. Darwi (in Cc) is going to work on the ARM parts when he
returns from vacation. I'm going to apply the infrastructure patches
(1-13) in the next days so they are out of the way for you and Darwi,
unless someone has any objections.
Thanks for picking this up and driving it forward!
tglx
On Thu, Feb 15, 2024 at 1:24 AM Thomas Gleixner <[email protected]> wrote:
>
> Anup!
>
> On Sat, Jan 27 2024 at 21:50, Anup Patel wrote:
> >> Changes since v11:
> >> - Rebased on Linux-6.8-rc1
> >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
> >> MSI handling to per device MSI domains" series by Thomas.
> >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
> >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
> >> https://lore.kernel.org/linux-arm-kernel/[email protected]/)
> >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
> >> - Updated IMSIC driver to support per-device MSI domains for PCI and
> >> platform devices.
> >
> > I have rebased and included 13 patches (which add per-device MSI domain
> > infrastructure) from your series [1]. In this series, the IMSIC driver
> > implements the msi_parent_ops and APLIC driver implements wired-to-msi
> > bridge using your new infrastructure.
> >
> > The remaining 27 patches of your series [1] requires testing on ARM
> > platforms which I don't have. I suggest these remaining patches to
> > go as separate series.
>
> Of course. Darwi (in Cc) is going to work on the ARM parts when he
> returns from vacation. I'm going to apply the infrastructure patches
> (1-13) in the next days so they are out of the way for you and Darwi,
> unless someone has any objections.
>
> Thanks for picking this up and driving it forward!
Thanks Thomas, I will be sending v13 of this series next week.
For the time being, I will carry the 13 infrastructure patches in
this series until they land in upstream Linux so that it is easier
for people to try this series.
Regards,
Anup
On Sat, 27 Jan 2024 16:17:32 +0000,
Anup Patel <[email protected]> wrote:
>
> From: Thomas Gleixner <[email protected]>
>
> Add a new domain bus token to prepare for device MSI which aims to replace
> the existing platform MSI maze.
>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Signed-off-by: Anup Patel <[email protected]>
> ---
> include/linux/irqdomain_defs.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/linux/irqdomain_defs.h b/include/linux/irqdomain_defs.h
> index c29921fd8cd1..4c69151cb9d2 100644
> --- a/include/linux/irqdomain_defs.h
> +++ b/include/linux/irqdomain_defs.h
> @@ -26,6 +26,7 @@ enum irq_domain_bus_token {
> DOMAIN_BUS_DMAR,
> DOMAIN_BUS_AMDVI,
> DOMAIN_BUS_PCI_DEVICE_IMS,
> + DOMAIN_BUS_DEVICE_IMS,
Only a personal taste, but since we keep calling it "device MSI",
which it really is, I find it slightly odd to name the token
"DEVICE_IMS".
From what I understand, IMS is PCIe specific. Platform (and by
extension device) MSI extends far beyond PCIe. So here, DEVICE_MSI
would make a lot more sense and avoid confusion.
But hey, I don't have much skin in this game, and I can probably
mentally rotate the acronym...
M.
--
Without deviation from the norm, progress is not possible.
On Sat, 27 Jan 2024 16:17:29 +0000,
Anup Patel <[email protected]> wrote:
>
> From: Thomas Gleixner <[email protected]>
>
> Currently the irqdomain select callback is only invoked when the parameter
> count of the fwspec arguments is not zero. That makes sense because then
> the match is on the firmware node and eventually on the bus_token, which is
> already handled in the core code.
>
> The upcoming support for per device MSI domains requires to do real bus
> token specific checks in the MSI parent domains with a zero parameter
> count.
>
> Make the gic-v3 select() callback handle that case.
>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Signed-off-by: Anup Patel <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Marc Zyngier <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
M.
--
Without deviation from the norm, progress is not possible.
On Wed, 14 Feb 2024 19:54:49 +0000,
Thomas Gleixner <[email protected]> wrote:
>
> Anup!
>
> On Sat, Jan 27 2024 at 21:50, Anup Patel wrote:
> >> Changes since v11:
> >> - Rebased on Linux-6.8-rc1
> >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
> >> MSI handling to per device MSI domains" series by Thomas.
> >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
> >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
> >> https://lore.kernel.org/linux-arm-kernel/[email protected]/)
> >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
> >> - Updated IMSIC driver to support per-device MSI domains for PCI and
> >> platform devices.
> >
> > I have rebased and included 13 patches (which add per-device MSI domain
> > infrastructure) from your series [1]. In this series, the IMSIC driver
> > implements the msi_parent_ops and APLIC driver implements wired-to-msi
> > bridge using your new infrastructure.
> >
> > The remaining 27 patches of your series [1] requires testing on ARM
> > platforms which I don't have. I suggest these remaining patches to
> > go as separate series.
>
> Of course. Darwi (in Cc) is going to work on the ARM parts when he
> returns from vacation. I'm going to apply the infrastructure patches
> (1-13) in the next days so they are out of the way for you and Darwi,
> unless someone has any objections.
FWIW, I've fiven the first 13 patches a go on two of the most
problematic platforms (Huawei's D05, and Marvell's McBin). Nothing
immediately broke, so it's obviously perfect.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
On Thu, Feb 15 2024 at 11:54, Marc Zyngier wrote:
> On Sat, 27 Jan 2024 16:17:32 +0000,
> Anup Patel <[email protected]> wrote:
>> DOMAIN_BUS_PCI_DEVICE_IMS,
>> + DOMAIN_BUS_DEVICE_IMS,
>
> Only a personal taste, but since we keep calling it "device MSI",
> which it really is, I find it slightly odd to name the token
> "DEVICE_IMS".
>
> From what I understand, IMS is PCIe specific. Platform (and by
> extension device) MSI extends far beyond PCIe. So here, DEVICE_MSI
> would make a lot more sense and avoid confusion.
That's true, but I chose it intentionally because Interrupt Message
Store (IMS) is a (PCI) device specific way to store the message contrary
to PCI/MSI[-X] which has standardized storage.
So my thought was that this exactly reflects what the platform device
requires: device specific message store, aka DMS or DSMS :)
> But hey, I don't have much skin in this game, and I can probably
> mentally rotate the acronym...
I have no strong opinion about it though.
Thanks,
tglx
The following commit has been merged into the irq/msi branch of tip:
Commit-ID: 9bbe13a5d414a7f8208dba64b54d2b6e4f7086bd
Gitweb: https://git.kernel.org/tip/9bbe13a5d414a7f8208dba64b54d2b6e4f7086bd
Author: Thomas Gleixner <[email protected]>
AuthorDate: Sat, 27 Jan 2024 21:47:41 +05:30
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 15 Feb 2024 17:55:41 +01:00
genirq/msi: Provide MSI_FLAG_PARENT_PM_DEV
Some platform-MSI implementations require that power management is
redirected to the underlying interrupt chip device. To make this work
with per device MSI domains provide a new feature flag and let the
core code handle the setup of dev->pm_dev when set during device MSI
domain creation.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
include/linux/msi.h | 2 ++
kernel/irq/msi.c | 5 ++++-
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 36ba6a0..26d07e2 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -554,6 +554,8 @@ enum {
MSI_FLAG_FREE_MSI_DESCS = (1 << 6),
/* Use dev->fwnode for MSI device domain creation */
MSI_FLAG_USE_DEV_FWNODE = (1 << 7),
+ /* Set parent->dev into domain->pm_dev on device domain creation */
+ MSI_FLAG_PARENT_PM_DEV = (1 << 8),
/* Mask for the generic functionality */
MSI_GENERIC_FLAGS_MASK = GENMASK(15, 0),
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 07e9daa..f90952e 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -845,8 +845,11 @@ static struct irq_domain *__msi_create_irq_domain(struct fwnode_handle *fwnode,
domain = irq_domain_create_hierarchy(parent, flags | IRQ_DOMAIN_FLAG_MSI, 0,
fwnode, &msi_domain_ops, info);
- if (domain)
+ if (domain) {
irq_domain_update_bus_token(domain, info->bus_token);
+ if (info->flags & MSI_FLAG_PARENT_PM_DEV)
+ domain->pm_dev = parent->pm_dev;
+ }
return domain;
}
The following commit has been merged into the irq/msi branch of tip:
Commit-ID: e49312fe09df36cc4eae0cd6e1b08b563a91e1bc
Gitweb: https://git.kernel.org/tip/e49312fe09df36cc4eae0cd6e1b08b563a91e1bc
Author: Thomas Gleixner <[email protected]>
AuthorDate: Sat, 27 Jan 2024 21:47:40 +05:30
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 15 Feb 2024 17:55:41 +01:00
genirq/irqdomain: Reroute device MSI create_mapping
Reroute interrupt allocation in irq_create_fwspec_mapping() if the domain
is a MSI device domain. This is required to convert the support for wire
to MSI bridges to per device MSI domains.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/irq/irqdomain.c | 26 ++++++++++++++++++++------
1 file changed, 20 insertions(+), 6 deletions(-)
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 8fee379..aeb4165 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -29,6 +29,7 @@ static int irq_domain_alloc_irqs_locked(struct irq_domain *domain, int irq_base,
unsigned int nr_irqs, int node, void *arg,
bool realloc, const struct irq_affinity_desc *affinity);
static void irq_domain_check_hierarchy(struct irq_domain *domain);
+static void irq_domain_free_one_irq(struct irq_domain *domain, unsigned int virq);
struct irqchip_fwid {
struct fwnode_handle fwnode;
@@ -858,8 +859,13 @@ unsigned int irq_create_fwspec_mapping(struct irq_fwspec *fwspec)
}
if (irq_domain_is_hierarchy(domain)) {
- virq = irq_domain_alloc_irqs_locked(domain, -1, 1, NUMA_NO_NODE,
- fwspec, false, NULL);
+ if (irq_domain_is_msi_device(domain)) {
+ mutex_unlock(&domain->root->mutex);
+ virq = msi_device_domain_alloc_wired(domain, hwirq, type);
+ mutex_lock(&domain->root->mutex);
+ } else
+ virq = irq_domain_alloc_irqs_locked(domain, -1, 1, NUMA_NO_NODE,
+ fwspec, false, NULL);
if (virq <= 0) {
virq = 0;
goto out;
@@ -914,7 +920,7 @@ void irq_dispose_mapping(unsigned int virq)
return;
if (irq_domain_is_hierarchy(domain)) {
- irq_domain_free_irqs(virq, 1);
+ irq_domain_free_one_irq(domain, virq);
} else {
irq_domain_disassociate(domain, virq);
irq_free_desc(virq);
@@ -1755,6 +1761,14 @@ void irq_domain_free_irqs(unsigned int virq, unsigned int nr_irqs)
irq_free_descs(virq, nr_irqs);
}
+static void irq_domain_free_one_irq(struct irq_domain *domain, unsigned int virq)
+{
+ if (irq_domain_is_msi_device(domain))
+ msi_device_domain_free_wired(domain, virq);
+ else
+ irq_domain_free_irqs(virq, 1);
+}
+
/**
* irq_domain_alloc_irqs_parent - Allocate interrupts from parent domain
* @domain: Domain below which interrupts must be allocated
@@ -1907,9 +1921,9 @@ static int irq_domain_alloc_irqs_locked(struct irq_domain *domain, int irq_base,
return -EINVAL;
}
-static void irq_domain_check_hierarchy(struct irq_domain *domain)
-{
-}
+static void irq_domain_check_hierarchy(struct irq_domain *domain) { }
+static void irq_domain_free_one_irq(struct irq_domain *domain, unsigned int virq) { }
+
#endif /* CONFIG_IRQ_DOMAIN_HIERARCHY */
#ifdef CONFIG_GENERIC_IRQ_DEBUGFS
The following commit has been merged into the irq/msi branch of tip:
Commit-ID: 0ee1578b00bcf5ef8e7955f0c6f02a624443eb29
Gitweb: https://git.kernel.org/tip/0ee1578b00bcf5ef8e7955f0c6f02a624443eb29
Author: Thomas Gleixner <[email protected]>
AuthorDate: Sat, 27 Jan 2024 21:47:39 +05:30
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 15 Feb 2024 17:55:41 +01:00
genirq/msi: Provide allocation/free functions for "wired" MSI interrupts
To support wire to MSI bridges proper in the MSI core infrastructure it is
required to have separate allocation/free interfaces which can be invoked
from the regular irqdomain allocaton/free functions.
The mechanism for allocation is:
- Allocate the next free MSI descriptor index in the domain
- Store the hardware interrupt number and the trigger type
which was extracted by the irqdomain core from the firmware spec
in the MSI descriptor device cookie so it can be retrieved by
the underlying interrupt domain and interrupt chip
- Use the regular MSI allocation mechanism for the newly allocated
index which returns a fully initialized Linux interrupt on succes
This works because:
- the domains have a fixed size
- each hardware interrupt is only allocated once
- the underlying domain does not care about the MSI index it only cares
about the hardware interrupt number and the trigger type
The free function looks up the MSI index in the MSI descriptor of the
provided Linux interrupt number and uses the regular index based free
functions of the MSI core.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
include/linux/irqdomain.h | 17 ++++++++++-
kernel/irq/msi.c | 68 ++++++++++++++++++++++++++++++++++++++-
2 files changed, 85 insertions(+)
diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index ee0a82c..21ecf58 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -619,6 +619,23 @@ static inline bool irq_domain_is_msi_device(struct irq_domain *domain)
#endif /* CONFIG_IRQ_DOMAIN_HIERARCHY */
+#ifdef CONFIG_GENERIC_MSI_IRQ
+int msi_device_domain_alloc_wired(struct irq_domain *domain, unsigned int hwirq,
+ unsigned int type);
+void msi_device_domain_free_wired(struct irq_domain *domain, unsigned int virq);
+#else
+static inline int msi_device_domain_alloc_wired(struct irq_domain *domain, unsigned int hwirq,
+ unsigned int type)
+{
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+}
+static inline void msi_device_domain_free_wired(struct irq_domain *domain, unsigned int virq)
+{
+ WARN_ON_ONCE(1);
+}
+#endif
+
#else /* CONFIG_IRQ_DOMAIN */
static inline void irq_dispose_mapping(unsigned int virq) { }
static inline struct irq_domain *irq_find_matching_fwnode(
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 5289fc2..07e9daa 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -1540,6 +1540,50 @@ struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, u
return map;
}
+/**
+ * msi_device_domain_alloc_wired - Allocate a "wired" interrupt on @domain
+ * @domain: The domain to allocate on
+ * @hwirq: The hardware interrupt number to allocate for
+ * @type: The interrupt type
+ *
+ * This weirdness supports wire to MSI controllers like MBIGEN.
+ *
+ * @hwirq is the hardware interrupt number which is handed in from
+ * irq_create_fwspec_mapping(). As the wire to MSI domain is sparse, but
+ * sized in firmware, the hardware interrupt number cannot be used as MSI
+ * index. For the underlying irq chip the MSI index is irrelevant and
+ * all it needs is the hardware interrupt number.
+ *
+ * To handle this the MSI index is allocated with MSI_ANY_INDEX and the
+ * hardware interrupt number is stored along with the type information in
+ * msi_desc::cookie so the underlying interrupt chip and domain code can
+ * retrieve it.
+ *
+ * Return: The Linux interrupt number (> 0) or an error code
+ */
+int msi_device_domain_alloc_wired(struct irq_domain *domain, unsigned int hwirq,
+ unsigned int type)
+{
+ unsigned int domid = MSI_DEFAULT_DOMAIN;
+ union msi_instance_cookie icookie = { };
+ struct device *dev = domain->dev;
+ struct msi_map map = { };
+
+ if (WARN_ON_ONCE(!dev || domain->bus_token != DOMAIN_BUS_WIRED_TO_MSI))
+ return -EINVAL;
+
+ icookie.value = ((u64)type << 32) | hwirq;
+
+ msi_lock_descs(dev);
+ if (WARN_ON_ONCE(msi_get_device_domain(dev, domid) != domain))
+ map.index = -EINVAL;
+ else
+ map = __msi_domain_alloc_irq_at(dev, domid, MSI_ANY_INDEX, NULL, &icookie);
+ msi_unlock_descs(dev);
+
+ return map.index >= 0 ? map.virq : map.index;
+}
+
static void __msi_domain_free_irqs(struct device *dev, struct irq_domain *domain,
struct msi_ctrl *ctrl)
{
@@ -1666,6 +1710,30 @@ void msi_domain_free_irqs_all(struct device *dev, unsigned int domid)
}
/**
+ * msi_device_domain_free_wired - Free a wired interrupt in @domain
+ * @domain: The domain to free the interrupt on
+ * @virq: The Linux interrupt number to free
+ *
+ * This is the counterpart of msi_device_domain_alloc_wired() for the
+ * weird wired to MSI converting domains.
+ */
+void msi_device_domain_free_wired(struct irq_domain *domain, unsigned int virq)
+{
+ struct msi_desc *desc = irq_get_msi_desc(virq);
+ struct device *dev = domain->dev;
+
+ if (WARN_ON_ONCE(!dev || !desc || domain->bus_token != DOMAIN_BUS_WIRED_TO_MSI))
+ return;
+
+ msi_lock_descs(dev);
+ if (!WARN_ON_ONCE(msi_get_device_domain(dev, MSI_DEFAULT_DOMAIN) != domain)) {
+ msi_domain_free_irqs_range_locked(dev, MSI_DEFAULT_DOMAIN, desc->msi_index,
+ desc->msi_index);
+ }
+ msi_unlock_descs(dev);
+}
+
+/**
* msi_get_domain_info - Get the MSI interrupt domain info for @domain
* @domain: The interrupt domain to retrieve data from
*
The following commit has been merged into the irq/msi branch of tip:
Commit-ID: 9d1c58c8004653b37721dd7b16f4360216778c94
Gitweb: https://git.kernel.org/tip/9d1c58c8004653b37721dd7b16f4360216778c94
Author: Thomas Gleixner <[email protected]>
AuthorDate: Sat, 27 Jan 2024 21:47:38 +05:30
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 15 Feb 2024 17:55:41 +01:00
genirq/msi: Optionally use dev->fwnode for device domain
To support wire to MSI domains via the MSI infrastructure it is required to
use the firmware node of the device which implements this for creating the
MSI domain. Otherwise the existing firmware match mechanisms to find the
correct irqdomain for a wired interrupt which is connected to a wire to MSI
bridge would fail.
This cannot be used for the general case because not all devices provide
firmware nodes and all regular per device MSI domains are directly
associated to the device and have not be searched for.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
include/linux/msi.h | 2 ++
kernel/irq/msi.c | 20 ++++++++++++++++----
2 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 24a5424..36ba6a0 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -552,6 +552,8 @@ enum {
MSI_FLAG_ALLOC_SIMPLE_MSI_DESCS = (1 << 5),
/* Free MSI descriptors */
MSI_FLAG_FREE_MSI_DESCS = (1 << 6),
+ /* Use dev->fwnode for MSI device domain creation */
+ MSI_FLAG_USE_DEV_FWNODE = (1 << 7),
/* Mask for the generic functionality */
MSI_GENERIC_FLAGS_MASK = GENMASK(15, 0),
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 8d46390..5289fc2 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -960,9 +960,9 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
void *chip_data)
{
struct irq_domain *domain, *parent = dev->msi.domain;
- const struct msi_parent_ops *pops;
+ struct fwnode_handle *fwnode, *fwnalloced = NULL;
struct msi_domain_template *bundle;
- struct fwnode_handle *fwnode;
+ const struct msi_parent_ops *pops;
if (!irq_domain_is_msi_parent(parent))
return false;
@@ -985,7 +985,19 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
pops->prefix ? : "", bundle->chip.name, dev_name(dev));
bundle->chip.name = bundle->name;
- fwnode = irq_domain_alloc_named_fwnode(bundle->name);
+ /*
+ * Using the device firmware node is required for wire to MSI
+ * device domains so that the existing firmware results in a domain
+ * match.
+ * All other device domains like PCI/MSI use the named firmware
+ * node as they are not guaranteed to have a fwnode. They are never
+ * looked up and always handled in the context of the device.
+ */
+ if (bundle->info.flags & MSI_FLAG_USE_DEV_FWNODE)
+ fwnode = dev->fwnode;
+ else
+ fwnode = fwnalloced = irq_domain_alloc_named_fwnode(bundle->name);
+
if (!fwnode)
goto free_bundle;
@@ -1012,7 +1024,7 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
fail:
msi_unlock_descs(dev);
free_fwnode:
- irq_domain_free_fwnode(fwnode);
+ irq_domain_free_fwnode(fwnalloced);
free_bundle:
kfree(bundle);
return false;
The following commit has been merged into the irq/msi branch of tip:
Commit-ID: 2d566a498d6483ba986dadc496f64a20b032608f
Gitweb: https://git.kernel.org/tip/2d566a498d6483ba986dadc496f64a20b032608f
Author: Thomas Gleixner <[email protected]>
AuthorDate: Sat, 27 Jan 2024 21:47:37 +05:30
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 15 Feb 2024 17:55:40 +01:00
genirq/msi: Provide DOMAIN_BUS_WIRED_TO_MSI
Provide a domain bus token for the upcoming support for wire to MSI device
domains so the domain can be distinguished from regular device MSI domains.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
include/linux/irqdomain_defs.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/irqdomain_defs.h b/include/linux/irqdomain_defs.h
index a7dea0c..5c1fe6f 100644
--- a/include/linux/irqdomain_defs.h
+++ b/include/linux/irqdomain_defs.h
@@ -27,6 +27,7 @@ enum irq_domain_bus_token {
DOMAIN_BUS_AMDVI,
DOMAIN_BUS_PCI_DEVICE_IMS,
DOMAIN_BUS_DEVICE_MSI,
+ DOMAIN_BUS_WIRED_TO_MSI,
};
#endif /* _LINUX_IRQDOMAIN_DEFS_H */
The following commit has been merged into the irq/msi branch of tip:
Commit-ID: 9c78c1a85c04bdfbccc5a50588e001087d942b08
Gitweb: https://git.kernel.org/tip/9c78c1a85c04bdfbccc5a50588e001087d942b08
Author: Thomas Gleixner <[email protected]>
AuthorDate: Sat, 27 Jan 2024 21:47:35 +05:30
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 15 Feb 2024 17:55:40 +01:00
genirq/msi: Provide optional translation op
irq_create_fwspec_mapping() requires translation of the firmware spec to a
hardware interrupt number and the trigger type information.
Wired interrupts which are connected to a wire to MSI bridge, like MBIGEN
are allocated that way. So far MBIGEN provides a regular irqdomain which
then hooks backwards into the MSI infrastructure. That's an unholy mess and
will be replaced with per device MSI domains which are regular MSI domains.
Interrupts on MSI domains are not supported by irq_create_fwspec_mapping(),
but for making the wire to MSI bridges sane it makes sense to provide a
special allocation/free interface in the MSI infrastructure. That avoids
the backdoors into the core MSI allocation code and just shares all the
regular MSI infrastructure.
Provide an optional translation callback in msi_domain_ops which can be
utilized by these wire to MSI bridges. No other MSI domain should provide a
translation callback. The default translation callback of the MSI
irqdomains will warn when it is invoked on a non-prepared MSI domain.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
include/linux/msi.h | 5 +++++
kernel/irq/msi.c | 15 +++++++++++++++
2 files changed, 20 insertions(+)
diff --git a/include/linux/msi.h b/include/linux/msi.h
index b0842ea..24a5424 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -412,6 +412,7 @@ bool arch_restore_msi_irqs(struct pci_dev *dev);
struct irq_domain;
struct irq_domain_ops;
struct irq_chip;
+struct irq_fwspec;
struct device_node;
struct fwnode_handle;
struct msi_domain_info;
@@ -431,6 +432,8 @@ struct msi_domain_info;
* function.
* @msi_post_free: Optional function which is invoked after freeing
* all interrupts.
+ * @msi_translate: Optional translate callback to support the odd wire to
+ * MSI bridges, e.g. MBIGEN
*
* @get_hwirq, @msi_init and @msi_free are callbacks used by the underlying
* irqdomain.
@@ -468,6 +471,8 @@ struct msi_domain_ops {
struct device *dev);
void (*msi_post_free)(struct irq_domain *domain,
struct device *dev);
+ int (*msi_translate)(struct irq_domain *domain, struct irq_fwspec *fwspec,
+ irq_hw_number_t *hwirq, unsigned int *type);
};
/**
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 79b4a58..c0e7378 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -726,11 +726,26 @@ static void msi_domain_free(struct irq_domain *domain, unsigned int virq,
irq_domain_free_irqs_top(domain, virq, nr_irqs);
}
+static int msi_domain_translate(struct irq_domain *domain, struct irq_fwspec *fwspec,
+ irq_hw_number_t *hwirq, unsigned int *type)
+{
+ struct msi_domain_info *info = domain->host_data;
+
+ /*
+ * This will catch allocations through the regular irqdomain path except
+ * for MSI domains which really support this, e.g. MBIGEN.
+ */
+ if (!info->ops->msi_translate)
+ return -ENOTSUPP;
+ return info->ops->msi_translate(domain, fwspec, hwirq, type);
+}
+
static const struct irq_domain_ops msi_domain_ops = {
.alloc = msi_domain_alloc,
.free = msi_domain_free,
.activate = msi_domain_activate,
.deactivate = msi_domain_deactivate,
+ .translate = msi_domain_translate,
};
static irq_hw_number_t msi_domain_ops_get_hwirq(struct msi_domain_info *info,
The following commit has been merged into the irq/msi branch of tip:
Commit-ID: 14fd06c776b5289a43c91cdc64bac3bdbc7b397e
Gitweb: https://git.kernel.org/tip/14fd06c776b5289a43c91cdc64bac3bdbc7b397e
Author: Thomas Gleixner <[email protected]>
AuthorDate: Sat, 27 Jan 2024 21:47:34 +05:30
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 15 Feb 2024 17:55:40 +01:00
irqchip: Convert all platform MSI users to the new API
Switch all the users of the platform MSI domain over to invoke the new
interfaces which branch to the original platform MSI functions when the
irqdomain associated to the caller device does not yet provide MSI parent
functionality.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
drivers/dma/mv_xor_v2.c | 8 ++++----
drivers/dma/qcom/hidma.c | 6 +++---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 5 +++--
drivers/mailbox/bcm-flexrm-mailbox.c | 8 ++++----
drivers/perf/arm_smmuv3_pmu.c | 4 ++--
drivers/ufs/host/ufs-qcom.c | 8 ++++----
6 files changed, 20 insertions(+), 19 deletions(-)
diff --git a/drivers/dma/mv_xor_v2.c b/drivers/dma/mv_xor_v2.c
index 1ebfbe8..97ebc79 100644
--- a/drivers/dma/mv_xor_v2.c
+++ b/drivers/dma/mv_xor_v2.c
@@ -747,8 +747,8 @@ static int mv_xor_v2_probe(struct platform_device *pdev)
if (IS_ERR(xor_dev->clk))
return PTR_ERR(xor_dev->clk);
- ret = platform_msi_domain_alloc_irqs(&pdev->dev, 1,
- mv_xor_v2_set_msi_msg);
+ ret = platform_device_msi_init_and_alloc_irqs(&pdev->dev, 1,
+ mv_xor_v2_set_msi_msg);
if (ret)
return ret;
@@ -851,7 +851,7 @@ free_hw_desq:
xor_dev->desc_size * MV_XOR_V2_DESC_NUM,
xor_dev->hw_desq_virt, xor_dev->hw_desq);
free_msi_irqs:
- platform_msi_domain_free_irqs(&pdev->dev);
+ platform_device_msi_free_irqs_all(&pdev->dev);
return ret;
}
@@ -867,7 +867,7 @@ static void mv_xor_v2_remove(struct platform_device *pdev)
devm_free_irq(&pdev->dev, xor_dev->irq, xor_dev);
- platform_msi_domain_free_irqs(&pdev->dev);
+ platform_device_msi_free_irqs_all(&pdev->dev);
tasklet_kill(&xor_dev->irq_tasklet);
}
diff --git a/drivers/dma/qcom/hidma.c b/drivers/dma/qcom/hidma.c
index d63b93d..202ac95 100644
--- a/drivers/dma/qcom/hidma.c
+++ b/drivers/dma/qcom/hidma.c
@@ -696,7 +696,7 @@ static void hidma_free_msis(struct hidma_dev *dmadev)
devm_free_irq(dev, virq, &dmadev->lldev);
}
- platform_msi_domain_free_irqs(dev);
+ platform_device_msi_free_irqs_all(dev);
#endif
}
@@ -706,8 +706,8 @@ static int hidma_request_msi(struct hidma_dev *dmadev,
#ifdef CONFIG_GENERIC_MSI_IRQ
int rc, i, virq;
- rc = platform_msi_domain_alloc_irqs(&pdev->dev, HIDMA_MSI_INTS,
- hidma_write_msi_msg);
+ rc = platform_device_msi_init_and_alloc_irqs(&pdev->dev, HIDMA_MSI_INTS,
+ hidma_write_msi_msg);
if (rc)
return rc;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 0ffb1cf..a74a509 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3125,7 +3125,8 @@ static int arm_smmu_update_gbpa(struct arm_smmu_device *smmu, u32 set, u32 clr)
static void arm_smmu_free_msis(void *data)
{
struct device *dev = data;
- platform_msi_domain_free_irqs(dev);
+
+ platform_device_msi_free_irqs_all(dev);
}
static void arm_smmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
@@ -3166,7 +3167,7 @@ static void arm_smmu_setup_msis(struct arm_smmu_device *smmu)
}
/* Allocate MSIs for evtq, gerror and priq. Ignore cmdq */
- ret = platform_msi_domain_alloc_irqs(dev, nvec, arm_smmu_write_msi_msg);
+ ret = platform_device_msi_init_and_alloc_irqs(dev, nvec, arm_smmu_write_msi_msg);
if (ret) {
dev_warn(dev, "failed to allocate MSIs - falling back to wired irqs\n");
return;
diff --git a/drivers/mailbox/bcm-flexrm-mailbox.c b/drivers/mailbox/bcm-flexrm-mailbox.c
index e3e28a4..b1abc2a 100644
--- a/drivers/mailbox/bcm-flexrm-mailbox.c
+++ b/drivers/mailbox/bcm-flexrm-mailbox.c
@@ -1587,8 +1587,8 @@ static int flexrm_mbox_probe(struct platform_device *pdev)
}
/* Allocate platform MSIs for each ring */
- ret = platform_msi_domain_alloc_irqs(dev, mbox->num_rings,
- flexrm_mbox_msi_write);
+ ret = platform_device_msi_init_and_alloc_irqs(dev, mbox->num_rings,
+ flexrm_mbox_msi_write);
if (ret)
goto fail_destroy_cmpl_pool;
@@ -1641,7 +1641,7 @@ skip_debugfs:
fail_free_debugfs_root:
debugfs_remove_recursive(mbox->root);
- platform_msi_domain_free_irqs(dev);
+ platform_device_msi_free_irqs_all(dev);
fail_destroy_cmpl_pool:
dma_pool_destroy(mbox->cmpl_pool);
fail_destroy_bd_pool:
@@ -1657,7 +1657,7 @@ static void flexrm_mbox_remove(struct platform_device *pdev)
debugfs_remove_recursive(mbox->root);
- platform_msi_domain_free_irqs(dev);
+ platform_device_msi_free_irqs_all(dev);
dma_pool_destroy(mbox->cmpl_pool);
dma_pool_destroy(mbox->bd_pool);
diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 6303b82..9e5d7fa 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -716,7 +716,7 @@ static void smmu_pmu_free_msis(void *data)
{
struct device *dev = data;
- platform_msi_domain_free_irqs(dev);
+ platform_device_msi_free_irqs_all(dev);
}
static void smmu_pmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
@@ -746,7 +746,7 @@ static void smmu_pmu_setup_msi(struct smmu_pmu *pmu)
if (!(readl(pmu->reg_base + SMMU_PMCG_CFGR) & SMMU_PMCG_CFGR_MSI))
return;
- ret = platform_msi_domain_alloc_irqs(dev, 1, smmu_pmu_write_msi_msg);
+ ret = platform_device_msi_init_and_alloc_irqs(dev, 1, smmu_pmu_write_msi_msg);
if (ret) {
dev_warn(dev, "failed to allocate MSIs\n");
return;
diff --git a/drivers/ufs/host/ufs-qcom.c b/drivers/ufs/host/ufs-qcom.c
index 39eef47..8fde520 100644
--- a/drivers/ufs/host/ufs-qcom.c
+++ b/drivers/ufs/host/ufs-qcom.c
@@ -1712,8 +1712,8 @@ static int ufs_qcom_config_esi(struct ufs_hba *hba)
* 2. Poll queues do not need ESI.
*/
nr_irqs = hba->nr_hw_queues - hba->nr_queues[HCTX_TYPE_POLL];
- ret = platform_msi_domain_alloc_irqs(hba->dev, nr_irqs,
- ufs_qcom_write_msi_msg);
+ ret = platform_device_msi_init_and_alloc_irqs(hba->dev, nr_irqs,
+ ufs_qcom_write_msi_msg);
if (ret) {
dev_err(hba->dev, "Failed to request Platform MSI %d\n", ret);
return ret;
@@ -1742,7 +1742,7 @@ static int ufs_qcom_config_esi(struct ufs_hba *hba)
devm_free_irq(hba->dev, desc->irq, hba);
}
msi_unlock_descs(hba->dev);
- platform_msi_domain_free_irqs(hba->dev);
+ platform_device_msi_free_irqs_all(hba->dev);
} else {
if (host->hw_ver.major == 6 && host->hw_ver.minor == 0 &&
host->hw_ver.step == 0)
@@ -1818,7 +1818,7 @@ static void ufs_qcom_remove(struct platform_device *pdev)
pm_runtime_get_sync(&(pdev)->dev);
ufshcd_remove(hba);
- platform_msi_domain_free_irqs(hba->dev);
+ platform_device_msi_free_irqs_all(hba->dev);
}
static const struct of_device_id ufs_qcom_of_match[] __maybe_unused = {
The following commit has been merged into the irq/msi branch of tip:
Commit-ID: c88f9110bfbca5975a8dee4c9792ba12684c7bca
Gitweb: https://git.kernel.org/tip/c88f9110bfbca5975a8dee4c9792ba12684c7bca
Author: Thomas Gleixner <[email protected]>
AuthorDate: Sat, 27 Jan 2024 21:47:33 +05:30
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 15 Feb 2024 17:55:40 +01:00
platform-msi: Prepare for real per device domains
Provide functions to create and remove per device MSI domains which replace
the platform-MSI domains. The new model is that each of the devices which
utilize platform-MSI gets now its private MSI domain which is "customized"
in size and with a device specific function to write the MSI message into
the device.
This is the same functionality as platform-MSI but it avoids all the down
sides of platform MSI, i.e. the extra ID book keeping, the special data
structure in the msi descriptor. Further the domains are only created when
the devices are really in use, so the burden is on the usage and not on the
infrastructure.
Fill in the domain template and provide two functions to init/allocate and
remove a per device MSI domain.
Until all users and parent domain providers are converted, the init/alloc
function invokes the original platform-MSI code when the irqdomain which is
associated to the device does not provide MSI parent functionality yet.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
drivers/base/platform-msi.c | 103 +++++++++++++++++++++++++++++++++++-
include/linux/msi.h | 4 +-
2 files changed, 107 insertions(+)
diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
index f37ad34..b56e919 100644
--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -13,6 +13,8 @@
#include <linux/msi.h>
#include <linux/slab.h>
+/* Begin of removal area. Once everything is converted over. Cleanup the includes too! */
+
#define DEV_ID_SHIFT 21
#define MAX_DEV_MSIS (1 << (32 - DEV_ID_SHIFT))
@@ -350,3 +352,104 @@ int platform_msi_device_domain_alloc(struct irq_domain *domain, unsigned int vir
return msi_domain_populate_irqs(domain->parent, dev, virq, nr_irqs, &data->arg);
}
+
+/* End of removal area */
+
+/* Real per device domain interfaces */
+
+/*
+ * This indirection can go when platform_device_msi_init_and_alloc_irqs()
+ * is switched to a proper irq_chip::irq_write_msi_msg() callback. Keep it
+ * simple for now.
+ */
+static void platform_msi_write_msi_msg(struct irq_data *d, struct msi_msg *msg)
+{
+ irq_write_msi_msg_t cb = d->chip_data;
+
+ cb(irq_data_get_msi_desc(d), msg);
+}
+
+static void platform_msi_set_desc_byindex(msi_alloc_info_t *arg, struct msi_desc *desc)
+{
+ arg->desc = desc;
+ arg->hwirq = desc->msi_index;
+}
+
+static const struct msi_domain_template platform_msi_template = {
+ .chip = {
+ .name = "pMSI",
+ .irq_mask = irq_chip_mask_parent,
+ .irq_unmask = irq_chip_unmask_parent,
+ .irq_write_msi_msg = platform_msi_write_msi_msg,
+ /* The rest is filled in by the platform MSI parent */
+ },
+
+ .ops = {
+ .set_desc = platform_msi_set_desc_byindex,
+ },
+
+ .info = {
+ .bus_token = DOMAIN_BUS_DEVICE_MSI,
+ },
+};
+
+/**
+ * platform_device_msi_init_and_alloc_irqs - Initialize platform device MSI
+ * and allocate interrupts for @dev
+ * @dev: The device for which to allocate interrupts
+ * @nvec: The number of interrupts to allocate
+ * @write_msi_msg: Callback to write an interrupt message for @dev
+ *
+ * Returns:
+ * Zero for success, or an error code in case of failure
+ *
+ * This creates a MSI domain on @dev which has @dev->msi.domain as
+ * parent. The parent domain sets up the new domain. The domain has
+ * a fixed size of @nvec. The domain is managed by devres and will
+ * be removed when the device is removed.
+ *
+ * Note: For migration purposes this falls back to the original platform_msi code
+ * up to the point where all platforms have been converted to the MSI
+ * parent model.
+ */
+int platform_device_msi_init_and_alloc_irqs(struct device *dev, unsigned int nvec,
+ irq_write_msi_msg_t write_msi_msg)
+{
+ struct irq_domain *domain = dev->msi.domain;
+
+ if (!domain || !write_msi_msg)
+ return -EINVAL;
+
+ /* Migration support. Will go away once everything is converted */
+ if (!irq_domain_is_msi_parent(domain))
+ return platform_msi_domain_alloc_irqs(dev, nvec, write_msi_msg);
+
+ /*
+ * @write_msi_msg is stored in the resulting msi_domain_info::data.
+ * The underlying domain creation mechanism will assign that
+ * callback to the resulting irq chip.
+ */
+ if (!msi_create_device_irq_domain(dev, MSI_DEFAULT_DOMAIN,
+ &platform_msi_template,
+ nvec, NULL, write_msi_msg))
+ return -ENODEV;
+
+ return msi_domain_alloc_irqs_range(dev, MSI_DEFAULT_DOMAIN, 0, nvec - 1);
+}
+EXPORT_SYMBOL_GPL(platform_device_msi_init_and_alloc_irqs);
+
+/**
+ * platform_device_msi_free_irqs_all - Free all interrupts for @dev
+ * @dev: The device for which to free interrupts
+ */
+void platform_device_msi_free_irqs_all(struct device *dev)
+{
+ struct irq_domain *domain = dev->msi.domain;
+
+ msi_domain_free_irqs_all(dev, MSI_DEFAULT_DOMAIN);
+
+ /* Migration support. Will go away once everything is converted */
+ if (!irq_domain_is_msi_parent(domain))
+ platform_msi_free_priv_data(dev);
+}
+EXPORT_SYMBOL_GPL(platform_device_msi_free_irqs_all);
diff --git a/include/linux/msi.h b/include/linux/msi.h
index d5d1513..ef16796 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -664,6 +664,10 @@ int platform_msi_device_domain_alloc(struct irq_domain *domain, unsigned int vir
void platform_msi_device_domain_free(struct irq_domain *domain, unsigned int virq,
unsigned int nvec);
void *platform_msi_get_host_data(struct irq_domain *domain);
+/* Per device platform MSI */
+int platform_device_msi_init_and_alloc_irqs(struct device *dev, unsigned int nvec,
+ irq_write_msi_msg_t write_msi_msg);
+void platform_device_msi_free_irqs_all(struct device *dev);
bool msi_device_has_isolated_msi(struct device *dev);
#else /* CONFIG_GENERIC_MSI_IRQ */
The following commit has been merged into the irq/msi branch of tip:
Commit-ID: 6516d5a295356f8fd5827a1c0954d7ed5b2324dd
Gitweb: https://git.kernel.org/tip/6516d5a295356f8fd5827a1c0954d7ed5b2324dd
Author: Thomas Gleixner <[email protected]>
AuthorDate: Sat, 27 Jan 2024 21:47:32 +05:30
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 15 Feb 2024 17:55:40 +01:00
genirq/irqdomain: Add DOMAIN_BUS_DEVICE_MSI
Add a new domain bus token to prepare for device MSI which aims to replace
the existing platform MSI maze.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
include/linux/irqdomain_defs.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/irqdomain_defs.h b/include/linux/irqdomain_defs.h
index c29921f..a7dea0c 100644
--- a/include/linux/irqdomain_defs.h
+++ b/include/linux/irqdomain_defs.h
@@ -26,6 +26,7 @@ enum irq_domain_bus_token {
DOMAIN_BUS_DMAR,
DOMAIN_BUS_AMDVI,
DOMAIN_BUS_PCI_DEVICE_IMS,
+ DOMAIN_BUS_DEVICE_MSI,
};
#endif /* _LINUX_IRQDOMAIN_DEFS_H */
The following commit has been merged into the irq/msi branch of tip:
Commit-ID: de1ff306dcf4546d6a8863b1f956335e0d3fbb9b
Gitweb: https://git.kernel.org/tip/de1ff306dcf4546d6a8863b1f956335e0d3fbb9b
Author: Thomas Gleixner <[email protected]>
AuthorDate: Sat, 27 Jan 2024 21:47:30 +05:30
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 15 Feb 2024 17:55:39 +01:00
genirq/irqdomain: Remove the param count restriction from select()
Now that the GIC-v3 callback can handle invocation with a fwspec parameter
count of 0 lift the restriction in the core code and invoke select()
unconditionally when the domain provides it.
Preparatory change for per device MSI domains.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/irq/irqdomain.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 0bdef4f..8fee379 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -448,7 +448,7 @@ struct irq_domain *irq_find_matching_fwspec(struct irq_fwspec *fwspec,
*/
mutex_lock(&irq_domain_mutex);
list_for_each_entry(h, &irq_domain_list, link) {
- if (h->ops->select && fwspec->param_count)
+ if (h->ops->select)
rc = h->ops->select(h, fwspec, bus_token);
else if (h->ops->match)
rc = h->ops->match(h, to_of_node(fwnode), bus_token);
The following commit has been merged into the irq/msi branch of tip:
Commit-ID: 15137825100422c4c393c87af5aa5a8fa297b1f3
Gitweb: https://git.kernel.org/tip/15137825100422c4c393c87af5aa5a8fa297b1f3
Author: Thomas Gleixner <[email protected]>
AuthorDate: Sat, 27 Jan 2024 21:47:29 +05:30
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 15 Feb 2024 17:55:39 +01:00
irqchip/gic-v3: Make gic_irq_domain_select() robust for zero parameter count
Currently the irqdomain select callback is only invoked when the parameter
count of the fwspec arguments is not zero. That makes sense because then
the match is on the firmware node and eventually on the bus_token, which is
already handled in the core code.
The upcoming support for per device MSI domains requires to do real bus
token specific checks in the MSI parent domains with a zero parameter
count.
Make the gic-v3 select() callback handle that case.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
drivers/irqchip/irq-gic-v3.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 98b0329..35b9362 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1702,9 +1702,13 @@ static int gic_irq_domain_select(struct irq_domain *d,
irq_hw_number_t hwirq;
/* Not for us */
- if (fwspec->fwnode != d->fwnode)
+ if (fwspec->fwnode != d->fwnode)
return 0;
+ /* Handle pure domain searches */
+ if (!fwspec->param_count)
+ return d->bus_token == bus_token;
+
/* If this is not DT, then we have a single domain */
if (!is_of_node(fwspec->fwnode))
return 1;
Anup!
On Thu, Feb 15 2024 at 11:18, Anup Patel wrote:
> On Thu, Feb 15, 2024 at 1:24 AM Thomas Gleixner <[email protected]> wrote:
>> Thanks for picking this up and driving it forward!
>
> Thanks Thomas, I will be sending v13 of this series next week.
>
> For the time being, I will carry the 13 infrastructure patches in
> this series until they land in upstream Linux so that it is easier
> for people to try this series.
I pushed out the lot on top of 6.8-rc4 (no other changes) to
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/msi
with some minimal changes (DEVICE_IMS -> DEVICE_MSI, removal of an
unused interface).
I'm going over the rest of the series after I dealt with my other patch
backlog.
Thanks,
tglx
The following commit has been merged into the irq/msi branch of tip:
Commit-ID: 3095cc0d5b2c246ddfcb18f54ed5557640224b6a
Gitweb: https://git.kernel.org/tip/3095cc0d5b2c246ddfcb18f54ed5557640224b6a
Author: Thomas Gleixner <[email protected]>
AuthorDate: Sat, 27 Jan 2024 21:47:36 +05:30
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 15 Feb 2024 17:55:40 +01:00
genirq/msi: Split msi_domain_alloc_irq_at()
In preparation for providing a special allocation function for wired
interrupts which are connected to a wire to MSI bridge, split the inner
workings of msi_domain_alloc_irq_at() out into a helper function so the
code can be shared.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/irq/msi.c | 76 ++++++++++++++++++++++++++---------------------
1 file changed, 43 insertions(+), 33 deletions(-)
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index c0e7378..8d46390 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -1446,34 +1446,10 @@ int msi_domain_alloc_irqs_all_locked(struct device *dev, unsigned int domid, int
return msi_domain_alloc_locked(dev, &ctrl);
}
-/**
- * msi_domain_alloc_irq_at - Allocate an interrupt from a MSI interrupt domain at
- * a given index - or at the next free index
- *
- * @dev: Pointer to device struct of the device for which the interrupts
- * are allocated
- * @domid: Id of the interrupt domain to operate on
- * @index: Index for allocation. If @index == %MSI_ANY_INDEX the allocation
- * uses the next free index.
- * @affdesc: Optional pointer to an interrupt affinity descriptor structure
- * @icookie: Optional pointer to a domain specific per instance cookie. If
- * non-NULL the content of the cookie is stored in msi_desc::data.
- * Must be NULL for MSI-X allocations
- *
- * This requires a MSI interrupt domain which lets the core code manage the
- * MSI descriptors.
- *
- * Return: struct msi_map
- *
- * On success msi_map::index contains the allocated index number and
- * msi_map::virq the corresponding Linux interrupt number
- *
- * On failure msi_map::index contains the error code and msi_map::virq
- * is %0.
- */
-struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, unsigned int index,
- const struct irq_affinity_desc *affdesc,
- union msi_instance_cookie *icookie)
+static struct msi_map __msi_domain_alloc_irq_at(struct device *dev, unsigned int domid,
+ unsigned int index,
+ const struct irq_affinity_desc *affdesc,
+ union msi_instance_cookie *icookie)
{
struct msi_ctrl ctrl = { .domid = domid, .nirqs = 1, };
struct irq_domain *domain;
@@ -1481,17 +1457,16 @@ struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, u
struct msi_desc *desc;
int ret;
- msi_lock_descs(dev);
domain = msi_get_device_domain(dev, domid);
if (!domain) {
map.index = -ENODEV;
- goto unlock;
+ return map;
}
desc = msi_alloc_desc(dev, 1, affdesc);
if (!desc) {
map.index = -ENOMEM;
- goto unlock;
+ return map;
}
if (icookie)
@@ -1500,7 +1475,7 @@ struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, u
ret = msi_insert_desc(dev, desc, domid, index);
if (ret) {
map.index = ret;
- goto unlock;
+ return map;
}
ctrl.first = ctrl.last = desc->msi_index;
@@ -1513,7 +1488,42 @@ struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, u
map.index = desc->msi_index;
map.virq = desc->irq;
}
-unlock:
+ return map;
+}
+
+/**
+ * msi_domain_alloc_irq_at - Allocate an interrupt from a MSI interrupt domain at
+ * a given index - or at the next free index
+ *
+ * @dev: Pointer to device struct of the device for which the interrupts
+ * are allocated
+ * @domid: Id of the interrupt domain to operate on
+ * @index: Index for allocation. If @index == %MSI_ANY_INDEX the allocation
+ * uses the next free index.
+ * @affdesc: Optional pointer to an interrupt affinity descriptor structure
+ * @icookie: Optional pointer to a domain specific per instance cookie. If
+ * non-NULL the content of the cookie is stored in msi_desc::data.
+ * Must be NULL for MSI-X allocations
+ *
+ * This requires a MSI interrupt domain which lets the core code manage the
+ * MSI descriptors.
+ *
+ * Return: struct msi_map
+ *
+ * On success msi_map::index contains the allocated index number and
+ * msi_map::virq the corresponding Linux interrupt number
+ *
+ * On failure msi_map::index contains the error code and msi_map::virq
+ * is %0.
+ */
+struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, unsigned int index,
+ const struct irq_affinity_desc *affdesc,
+ union msi_instance_cookie *icookie)
+{
+ struct msi_map map;
+
+ msi_lock_descs(dev);
+ map = __msi_domain_alloc_irq_at(dev, domid, index, affdesc, icookie);
msi_unlock_descs(dev);
return map;
}
The following commit has been merged into the irq/msi branch of tip:
Commit-ID: ac81e94ab001c2882e89c9b61417caea64b800df
Gitweb: https://git.kernel.org/tip/ac81e94ab001c2882e89c9b61417caea64b800df
Author: Thomas Gleixner <[email protected]>
AuthorDate: Sat, 27 Jan 2024 21:47:31 +05:30
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 15 Feb 2024 17:55:40 +01:00
genirq/msi: Extend msi_parent_ops
Supporting per device MSI domains on ARM64, RISC-V and the zoo of
interrupt mechanisms needs a bit more information than what the
initial x86 implementation provides.
Add the following fields:
- required_flags: The flags which a parent domain requires to be set
- bus_select_token: The bus token of the parent domain for select()
- bus_select_mask: A bitmask of supported child domain bus types
This allows to provide library functions which can be shared between
various interrupt chip implementations and avoids replicating mostly
similar code all over the place.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
include/linux/msi.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/include/linux/msi.h b/include/linux/msi.h
index ddace8c..d5d1513 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -572,6 +572,11 @@ enum {
* struct msi_parent_ops - MSI parent domain callbacks and configuration info
*
* @supported_flags: Required: The supported MSI flags of the parent domain
+ * @required_flags: Optional: The required MSI flags of the parent MSI domain
+ * @bus_select_token: Optional: The bus token of the real parent domain for
+ * irq_domain::select()
+ * @bus_select_mask: Optional: A mask of supported BUS_DOMAINs for
+ * irq_domain::select()
* @prefix: Optional: Prefix for the domain and chip name
* @init_dev_msi_info: Required: Callback for MSI parent domains to setup parent
* domain specific domain flags, domain ops and interrupt chip
@@ -579,6 +584,9 @@ enum {
*/
struct msi_parent_ops {
u32 supported_flags;
+ u32 required_flags;
+ u32 bus_select_token;
+ u32 bus_select_mask;
const char *prefix;
bool (*init_dev_msi_info)(struct device *dev, struct irq_domain *domain,
struct irq_domain *msi_parent_domain,
Hi Andrea,
On Thu, Feb 8, 2024 at 3:40 PM Andrea Parri <[email protected]> wrote:
>
> Hi Anup,
>
> I understand that some refactoring is in progress, but I don't see the
> report below; adding it here hoping that it can be useful towards v13.
> (Unfortunately, I didn't have enough time to debug this yet...)
>
>
> > irqchip/sifive-plic: Convert PLIC driver into a platform driver
>
> I'm seeing the following LOCKDEP warning with this series, bisected to
> the commit above. This is a defconfig + PROVE_LOCKING=y build, booted
> using -machine virt,aia=none.
>
> [ 0.953473] ========================================================
> [ 0.953704] WARNING: possible irq lock inversion dependency detected
> [ 0.953955] 6.8.0-rc1-00039-gd9b9d6eb987f #1122 Not tainted
> [ 0.954224] --------------------------------------------------------
> [ 0.954444] swapper/0/0 just changed the state of lock:
> [ 0.954664] ffffaf808109d0c8 (&irq_desc_lock_class){-...}-{2:2}, at: handle_fasteoi_irq+0x24/0x1da
> [ 0.955699] but this lock took another, HARDIRQ-unsafe lock in the past:
> [ 0.955942] (&handler->enable_lock){+.+.}-{2:2}
> [ 0.955974]
>
> and interrupts could create inverse lock ordering between them.
>
> [ 0.956507]
> other info that might help us debug this:
> [ 0.956775] Possible interrupt unsafe locking scenario:
>
> [ 0.956998] CPU0 CPU1
> [ 0.957247] ---- ----
> [ 0.957439] lock(&handler->enable_lock);
> [ 0.957607] local_irq_disable();
> [ 0.957793] lock(&irq_desc_lock_class);
> [ 0.958021] lock(&handler->enable_lock);
> [ 0.958246] <Interrupt>
> [ 0.958342] lock(&irq_desc_lock_class);
> [ 0.958501]
> *** DEADLOCK ***
I was able to reproduce this warning.
Further digging, it turns out the locking safety issue existed
before PLIC was converted into platform driver but was never
caught by the lock dependency because previously it was
being probed very early using IRQCHIP_DECLARE().
I will include a separate patch in v13 to fix this warning.
Thanks,
Anup
>
> [ 0.958715] no locks held by swapper/0/0.
> [ 0.958870]
> the shortest dependencies between 2nd lock and 1st lock:
> [ 0.959152] -> (&handler->enable_lock){+.+.}-{2:2} {
> [ 0.959372] HARDIRQ-ON-W at:
> [ 0.959522] __lock_acquire+0x884/0x1f5c
> [ 0.959745] lock_acquire+0xf0/0x292
> [ 0.959913] _raw_spin_lock+0x2c/0x40
> [ 0.960090] plic_probe+0x322/0x65c
> [ 0.960257] platform_probe+0x4e/0x92
> [ 0.960432] really_probe+0x82/0x210
> [ 0.960598] __driver_probe_device+0x5c/0xd0
> [ 0.960784] driver_probe_device+0x2c/0xb0
> [ 0.960964] __driver_attach+0x72/0x10a
> [ 0.961151] bus_for_each_dev+0x60/0xac
> [ 0.961330] driver_attach+0x1a/0x22
> [ 0.961496] bus_add_driver+0xd4/0x19e
> [ 0.961666] driver_register+0x3e/0xd8
> [ 0.961835] __platform_driver_register+0x1c/0x24
> [ 0.962030] plic_driver_init+0x1a/0x22
> [ 0.962201] do_one_initcall+0x80/0x268
> [ 0.962371] kernel_init_freeable+0x296/0x300
> [ 0.962554] kernel_init+0x1e/0x10a
> [ 0.962713] ret_from_fork+0xe/0x1c
> [ 0.962884] SOFTIRQ-ON-W at:
> [ 0.962994] __lock_acquire+0x89e/0x1f5c
> [ 0.963169] lock_acquire+0xf0/0x292
> [ 0.963336] _raw_spin_lock+0x2c/0x40
> [ 0.963497] plic_probe+0x322/0x65c
> [ 0.963664] platform_probe+0x4e/0x92
> [ 0.963849] really_probe+0x82/0x210
> [ 0.964054] __driver_probe_device+0x5c/0xd0
> [ 0.964255] driver_probe_device+0x2c/0xb0
> [ 0.964428] __driver_attach+0x72/0x10a
> [ 0.964603] bus_for_each_dev+0x60/0xac
> [ 0.964777] driver_attach+0x1a/0x22
> [ 0.964943] bus_add_driver+0xd4/0x19e
> [ 0.965343] driver_register+0x3e/0xd8
> [ 0.965527] __platform_driver_register+0x1c/0x24
> [ 0.965732] plic_driver_init+0x1a/0x22
> [ 0.965908] do_one_initcall+0x80/0x268
> [ 0.966078] kernel_init_freeable+0x296/0x300
> [ 0.966268] kernel_init+0x1e/0x10a
> [ 0.966436] ret_from_fork+0xe/0x1c
> [ 0.966599] INITIAL USE at:
> [ 0.966716] __lock_acquire+0x3fc/0x1f5c
> [ 0.966891] lock_acquire+0xf0/0x292
> [ 0.967048] _raw_spin_lock+0x2c/0x40
> [ 0.967206] plic_probe+0x322/0x65c
> [ 0.967360] platform_probe+0x4e/0x92
> [ 0.967522] really_probe+0x82/0x210
> [ 0.967678] __driver_probe_device+0x5c/0xd0
> [ 0.967853] driver_probe_device+0x2c/0xb0
> [ 0.968025] __driver_attach+0x72/0x10a
> [ 0.968185] bus_for_each_dev+0x60/0xac
> [ 0.968348] driver_attach+0x1a/0x22
> [ 0.968513] bus_add_driver+0xd4/0x19e
> [ 0.968678] driver_register+0x3e/0xd8
> [ 0.968839] __platform_driver_register+0x1c/0x24
> [ 0.969035] plic_driver_init+0x1a/0x22
> [ 0.969239] do_one_initcall+0x80/0x268
> [ 0.969431] kernel_init_freeable+0x296/0x300
> [ 0.969610] kernel_init+0x1e/0x10a
> [ 0.969766] ret_from_fork+0xe/0x1c
> [ 0.969936] }
> [ 0.970010] ... key at: [<ffffffff824f4138>] __key.2+0x0/0x10
> [ 0.970224] ... acquired at:
> [ 0.970353] lock_acquire+0xf0/0x292
> [ 0.970482] _raw_spin_lock+0x2c/0x40
> [ 0.970609] plic_irq_enable+0x7e/0x140
> [ 0.970739] irq_enable+0x2c/0x60
> [ 0.970882] __irq_startup+0x58/0x60
> [ 0.971008] irq_startup+0x5e/0x13c
> [ 0.971126] __setup_irq+0x4de/0x5da
> [ 0.971248] request_threaded_irq+0xcc/0x12e
> [ 0.971394] vm_find_vqs+0x62/0x50a
> [ 0.971518] probe_common+0xfe/0x1d2
> [ 0.971635] virtrng_probe+0xc/0x14
> [ 0.971751] virtio_dev_probe+0x154/0x1fc
> [ 0.971878] really_probe+0x82/0x210
> [ 0.972008] __driver_probe_device+0x5c/0xd0
> [ 0.972147] driver_probe_device+0x2c/0xb0
> [ 0.972280] __driver_attach+0x72/0x10a
> [ 0.972407] bus_for_each_dev+0x60/0xac
> [ 0.972540] driver_attach+0x1a/0x22
> [ 0.972656] bus_add_driver+0xd4/0x19e
> [ 0.972777] driver_register+0x3e/0xd8
> [ 0.972896] register_virtio_driver+0x1c/0x2a
> [ 0.973049] virtio_rng_driver_init+0x18/0x20
> [ 0.973236] do_one_initcall+0x80/0x268
> [ 0.973399] kernel_init_freeable+0x296/0x300
> [ 0.973540] kernel_init+0x1e/0x10a
> [ 0.973658] ret_from_fork+0xe/0x1c
>
> [ 0.973858] -> (&irq_desc_lock_class){-...}-{2:2} {
> [ 0.974036] IN-HARDIRQ-W at:
> [ 0.974142] __lock_acquire+0xa82/0x1f5c
> [ 0.974309] lock_acquire+0xf0/0x292
> [ 0.974467] _raw_spin_lock+0x2c/0x40
> [ 0.974625] handle_fasteoi_irq+0x24/0x1da
> [ 0.974794] generic_handle_domain_irq+0x1c/0x2a
> [ 0.974982] plic_handle_irq+0x7e/0xf0
> [ 0.975143] generic_handle_domain_irq+0x1c/0x2a
> [ 0.975329] riscv_intc_irq+0x2e/0x46
> [ 0.975488] handle_riscv_irq+0x4a/0x74
> [ 0.975652] call_on_irq_stack+0x32/0x40
> [ 0.975817] INITIAL USE at:
> [ 0.975923] __lock_acquire+0x3fc/0x1f5c
> [ 0.976087] lock_acquire+0xf0/0x292
> [ 0.976244] _raw_spin_lock_irqsave+0x3a/0x64
> [ 0.976423] __irq_get_desc_lock+0x5a/0x84
> [ 0.976594] irq_modify_status+0x2a/0x106
> [ 0.976764] irq_set_percpu_devid+0x62/0x78
> [ 0.976939] riscv_intc_domain_map+0x1e/0x54
> [ 0.977133] irq_domain_associate_locked+0x42/0xe4
> [ 0.977363] irq_create_mapping_affinity+0x98/0xd4
> [ 0.977570] sbi_ipi_init+0x70/0x142
> [ 0.977744] init_IRQ+0xfe/0x11a
> [ 0.977906] start_kernel+0x4ae/0x790
> [ 0.978082] }
> [ 0.978151] ... key at: [<ffffffff824bbee0>] irq_desc_lock_class+0x0/0x10
> [ 0.978389] ... acquired at:
> [ 0.978494] mark_lock+0x3fe/0x8c2
> [ 0.978624] __lock_acquire+0xa82/0x1f5c
> [ 0.978766] lock_acquire+0xf0/0x292
> [ 0.978897] _raw_spin_lock+0x2c/0x40
> [ 0.979029] handle_fasteoi_irq+0x24/0x1da
> [ 0.979171] generic_handle_domain_irq+0x1c/0x2a
> [ 0.979326] plic_handle_irq+0x7e/0xf0
> [ 0.979460] generic_handle_domain_irq+0x1c/0x2a
> [ 0.979618] riscv_intc_irq+0x2e/0x46
> [ 0.979751] handle_riscv_irq+0x4a/0x74
> [ 0.979888] call_on_irq_stack+0x32/0x40
>
> [ 0.980110]
> stack backtrace:
> [ 0.980358] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc1-00039-gd9b9d6eb987f #1122
> [ 0.980662] Hardware name: riscv-virtio,qemu (DT)
> [ 0.980913] Call Trace:
> [ 0.981042] [<ffffffff80007198>] dump_backtrace+0x1c/0x24
> [ 0.981246] [<ffffffff80ae020a>] show_stack+0x2c/0x38
> [ 0.981456] [<ffffffff80aedac4>] dump_stack_lvl+0x5a/0x7c
> [ 0.981648] [<ffffffff80aedafa>] dump_stack+0x14/0x1c
> [ 0.981813] [<ffffffff80ae17a4>] print_irq_inversion_bug.part.0+0x162/0x176
> [ 0.982031] [<ffffffff8007c6e6>] mark_lock+0x3fe/0x8c2
> [ 0.982198] [<ffffffff8007d888>] __lock_acquire+0xa82/0x1f5c
> [ 0.982377] [<ffffffff8007f59e>] lock_acquire+0xf0/0x292
> [ 0.982549] [<ffffffff80af9962>] _raw_spin_lock+0x2c/0x40
> [ 0.982721] [<ffffffff8008f3fe>] handle_fasteoi_irq+0x24/0x1da
> [ 0.982904] [<ffffffff8008a4a4>] generic_handle_domain_irq+0x1c/0x2a
> [ 0.983112] [<ffffffff80581dc0>] plic_handle_irq+0x7e/0xf0
> [ 0.983293] [<ffffffff8008a4a4>] generic_handle_domain_irq+0x1c/0x2a
> [ 0.983495] [<ffffffff8057fb1a>] riscv_intc_irq+0x2e/0x46
> [ 0.983671] [<ffffffff80aedb4c>] handle_riscv_irq+0x4a/0x74
> [ 0.983856] [<ffffffff80afa756>] call_on_irq_stack+0x32/0x40
>
>
> When I switch to -machine virt,aia=aplic-imsic (same config as above), I
> get:
>
> [ 0.971406] ============================================
> [ 0.971439] WARNING: possible recursive locking detected
> [ 0.971497] 6.8.0-rc1-00039-gd9b9d6eb987f #1122 Not tainted
> [ 0.971583] --------------------------------------------
> [ 0.971612] swapper/0/1 is trying to acquire lock:
> [ 0.971662] ffffaf83fefa8e78 (&lpriv->ids_lock){-...}-{2:2}, at: imsic_vector_move+0x92/0x146
> [ 0.971927]
> but task is already holding lock:
> [ 0.971975] ffffaf83fef6ee78 (&lpriv->ids_lock){-...}-{2:2}, at: imsic_vector_move+0x86/0x146
> [ 0.972045]
> other info that might help us debug this:
> [ 0.972085] Possible unsafe locking scenario:
>
> [ 0.972114] CPU0
> [ 0.972133] ----
> [ 0.972153] lock(&lpriv->ids_lock);
> [ 0.972191] lock(&lpriv->ids_lock);
> [ 0.972228]
> *** DEADLOCK ***
>
> [ 0.972258] May be due to missing lock nesting notation
>
> [ 0.972306] 6 locks held by swapper/0/1:
> [ 0.972338] #0: ffffaf8081f65970 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x6a/0x10a
> [ 0.972413] #1: ffffaf808217c240 (&desc->request_mutex){+.+.}-{3:3}, at: __setup_irq+0xa2/0x5da
> [ 0.972492] #2: ffffaf808217c0c8 (&irq_desc_lock_class){....}-{2:2}, at: __setup_irq+0xbe/0x5da
> [ 0.972555] #3: ffffffff81892ac0 (mask_lock){....}-{2:2}, at: irq_setup_affinity+0x38/0xc6
> [ 0.972617] #4: ffffffff81892a80 (tmp_mask_lock){....}-{2:2}, at: irq_do_set_affinity+0x3a/0x164
> [ 0.972681] #5: ffffaf83fef6ee78 (&lpriv->ids_lock){-...}-{2:2}, at: imsic_vector_move+0x86/0x146
> [ 0.972753]
> stack backtrace:
> [ 0.972852] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc1-00039-gd9b9d6eb987f #1122
> [ 0.972900] Hardware name: riscv-virtio,qemu (DT)
> [ 0.972987] Call Trace:
> [ 0.973019] [<ffffffff80007198>] dump_backtrace+0x1c/0x24
> [ 0.973054] [<ffffffff80ae020a>] show_stack+0x2c/0x38
> [ 0.973083] [<ffffffff80aedac4>] dump_stack_lvl+0x5a/0x7c
> [ 0.973112] [<ffffffff80aedafa>] dump_stack+0x14/0x1c
> [ 0.973139] [<ffffffff8007ad5e>] print_deadlock_bug+0x282/0x328
> [ 0.973168] [<ffffffff8007e15c>] __lock_acquire+0x1356/0x1f5c
> [ 0.973198] [<ffffffff8007f59e>] lock_acquire+0xf0/0x292
> [ 0.973225] [<ffffffff80af9adc>] _raw_spin_lock_irqsave+0x3a/0x64
> [ 0.973255] [<ffffffff80581210>] imsic_vector_move+0x92/0x146
> [ 0.973285] [<ffffffff80581a04>] imsic_irq_set_affinity+0x8e/0xc6
> [ 0.973315] [<ffffffff8008c86a>] irq_do_set_affinity+0x142/0x164
> [ 0.973345] [<ffffffff8008cc22>] irq_setup_affinity+0x68/0xc6
> [ 0.973374] [<ffffffff8008fa82>] irq_startup+0x72/0x13c
> [ 0.973401] [<ffffffff8008d40c>] __setup_irq+0x4de/0x5da
> [ 0.973430] [<ffffffff8008d5d4>] request_threaded_irq+0xcc/0x12e
> [ 0.973460] [<ffffffff806346d8>] vp_find_vqs_msix+0x114/0x376
> [ 0.973491] [<ffffffff80634970>] vp_find_vqs+0x36/0x136
> [ 0.973518] [<ffffffff80633280>] vp_modern_find_vqs+0x16/0x4e
> [ 0.973547] [<ffffffff80ab31f8>] p9_virtio_probe+0x8e/0x31c
> [ 0.973576] [<ffffffff8062d982>] virtio_dev_probe+0x154/0x1fc
> [ 0.973605] [<ffffffff80693738>] really_probe+0x82/0x210
> [ 0.973632] [<ffffffff80693922>] __driver_probe_device+0x5c/0xd0
> [ 0.973661] [<ffffffff806939c2>] driver_probe_device+0x2c/0xb0
> [ 0.973690] [<ffffffff80693b46>] __driver_attach+0x72/0x10a
> [ 0.973718] [<ffffffff8069191a>] bus_for_each_dev+0x60/0xac
> [ 0.973746] [<ffffffff80693164>] driver_attach+0x1a/0x22
> [ 0.973773] [<ffffffff80692ade>] bus_add_driver+0xd4/0x19e
> [ 0.973801] [<ffffffff8069487e>] driver_register+0x3e/0xd8
> [ 0.973829] [<ffffffff8062d1ce>] register_virtio_driver+0x1c/0x2a
> [ 0.973858] [<ffffffff80c3da52>] p9_virtio_init+0x36/0x56
> [ 0.973887] [<ffffffff800028fe>] do_one_initcall+0x80/0x268
> [ 0.973915] [<ffffffff80c01144>] kernel_init_freeable+0x296/0x300
> [ 0.973944] [<ffffffff80af05dc>] kernel_init+0x1e/0x10a
> [ 0.973972] [<ffffffff80afa716>] ret_from_fork+0xe/0x1c
>
>
> FWIW, the full Qemu command I used was as follows:
>
> sudo /home/andrea/Downloads/qemu/build/qemu-system-riscv64 \
> -append "root=/dev/root rw rootfstype=9p rootflags=version=9p2000.L,trans=virtio,cache=mmap,access=any raid=noautodetect security=none loglevel=7" \
> -cpu rv64,sv57=off,svadu=off,svnapot=off \
> -device virtio-net-device,netdev=net0 \
> -device virtio-rng-device,rng=rng0 \
> -device virtio-9p-pci,fsdev=root,mount_tag=/dev/root \
> -fsdev local,id=root,path=/home/andrea/Downloads/jammy/,security_model=none \
> -kernel /home/andrea/linux/arch/riscv/boot/Image \
> -m 16G \
> -machine virt,aia=<either "none" or "aplic-imsic"> \
> -monitor telnet:127.0.0.1:55555,server,nowait \
> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 \
> -nographic \
> -object rng-random,filename=/dev/urandom,id=rng0 \
> -serial mon:stdio \
> -smp 5
>
>
> Andrea
On Sat, Jan 27 2024 at 21:47, Anup Patel wrote:
> + priv->irqdomain = irq_domain_create_linear(dev->fwnode, nr_irqs + 1,
> + &plic_irqdomain_ops, priv);
> + if (WARN_ON(!priv->irqdomain))
> + return -ENOMEM;
While some of the stuff is cleaned up by devm, the error handling in
this code looks pretty fragile as it leaves initialized contexts,
hardware state, chained handlers etc. around.
The question is whether the system can actually boot or work at all if
any of this fails.
> +
> /*
> * We can have multiple PLIC instances so setup cpuhp state
> - * and register syscore operations only when context handler
> - * for current/boot CPU is present.
> + * and register syscore operations only after context handlers
> + * of all online CPUs are initialized.
> */
> - handler = this_cpu_ptr(&plic_handlers);
> - if (handler->present && !plic_cpuhp_setup_done) {
> + cpuhp_setup = true;
> + for_each_online_cpu(cpu) {
> + handler = per_cpu_ptr(&plic_handlers, cpu);
> + if (!handler->present) {
> + cpuhp_setup = false;
> + break;
> + }
> + }
> + if (cpuhp_setup) {
> cpuhp_setup_state(CPUHP_AP_IRQ_SIFIVE_PLIC_STARTING,
> "irqchip/sifive/plic:starting",
> plic_starting_cpu, plic_dying_cpu);
> register_syscore_ops(&plic_irq_syscore_ops);
> - plic_cpuhp_setup_done = true;
I don't think that removing the setup protection is correct.
Assume you have maxcpus=N on the kernel command line, then the above
for_each_online_cpu() loop would result in cpuhp_setup == true when the
instances for the not onlined CPUs are set up, no?
Thanks,
tglx
On Fri, Feb 16, 2024 at 9:03 PM Thomas Gleixner <[email protected]> wrote:
>
> On Sat, Jan 27 2024 at 21:47, Anup Patel wrote:
> > + priv->irqdomain = irq_domain_create_linear(dev->fwnode, nr_irqs + 1,
> > + &plic_irqdomain_ops, priv);
> > + if (WARN_ON(!priv->irqdomain))
> > + return -ENOMEM;
>
> While some of the stuff is cleaned up by devm, the error handling in
> this code looks pretty fragile as it leaves initialized contexts,
> hardware state, chained handlers etc. around.
Sure, let me try to improve the error handling.
>
> The question is whether the system can actually boot or work at all if
> any of this fails.
On platforms with PLIC, the PLIC only manages wired interrupts
whereas IPIs are provided through SBI (firmware interface) so a
system can actually continue and boot further without PLIC.
In fact, we do have a synthetic platform (namely QEMU spike)
where there is no PLIC instance and Linux boots using SBI based
polling console.
>
> > +
> > /*
> > * We can have multiple PLIC instances so setup cpuhp state
> > - * and register syscore operations only when context handler
> > - * for current/boot CPU is present.
> > + * and register syscore operations only after context handlers
> > + * of all online CPUs are initialized.
> > */
> > - handler = this_cpu_ptr(&plic_handlers);
> > - if (handler->present && !plic_cpuhp_setup_done) {
> > + cpuhp_setup = true;
> > + for_each_online_cpu(cpu) {
> > + handler = per_cpu_ptr(&plic_handlers, cpu);
> > + if (!handler->present) {
> > + cpuhp_setup = false;
> > + break;
> > + }
> > + }
> > + if (cpuhp_setup) {
> > cpuhp_setup_state(CPUHP_AP_IRQ_SIFIVE_PLIC_STARTING,
> > "irqchip/sifive/plic:starting",
> > plic_starting_cpu, plic_dying_cpu);
> > register_syscore_ops(&plic_irq_syscore_ops);
> > - plic_cpuhp_setup_done = true;
>
> I don't think that removing the setup protection is correct.
>
> Assume you have maxcpus=N on the kernel command line, then the above
> for_each_online_cpu() loop would result in cpuhp_setup == true when the
> instances for the not onlined CPUs are set up, no?
A platform can have multiple PLIC instances where each PLIC
instance targets a subset of HARTs (or CPUs).
Previously (before this patch), we were probing PLIC very early so on
a platform with multiple PLIC instances, we need to ensure that cpuhp
setup is done only after PLIC context associated with boot CPU is
initialized hence the plic_cpuhp_setup_done check.
This patch converts PLIC driver into a platform driver so now PLIC
instances are probed after all available CPUs are brought-up. In this
case, the cpuhp setup must be done only after PLIC context of all
available CPUs are initialized otherwise some of the CPUs crash
in plic_starting_cpu() due to lack of PLIC context initialization.
Regards,
Anup
On Sat, Jan 27 2024 at 21:47, Anup Patel wrote:
> +
> +#ifdef CONFIG_SMP
> +static irqreturn_t imsic_local_sync_handler(int irq, void *data)
> +{
> + imsic_local_sync();
> + return IRQ_HANDLED;
> +}
> +
> +static void imsic_ipi_send(unsigned int cpu)
> +{
> + struct imsic_local_config *local =
> + per_cpu_ptr(imsic->global.local, cpu);
Let it stick out. We switched to line length 100 quite some time
ago. Applies to the rest of the series too.
> + writel_relaxed(IMSIC_IPI_ID, local->msi_va);
> +}
> +
> +static void imsic_ipi_starting_cpu(void)
> +{
> + /* Enable IPIs for current CPU. */
> + __imsic_id_set_enable(IMSIC_IPI_ID);
> +
> + /* Enable virtual IPI used for IMSIC ID synchronization */
> + enable_percpu_irq(imsic->ipi_virq, 0);
> +}
> +
> +static void imsic_ipi_dying_cpu(void)
> +{
> + /*
> + * Disable virtual IPI used for IMSIC ID synchronization so
> + * that we don't receive ID synchronization requests.
> + */
> + disable_percpu_irq(imsic->ipi_virq);
Shouldn't this disable the hardware too, i.e.
__imsic_id_clear_enable()
?
> +}
> +
> +static int __init imsic_ipi_domain_init(void)
> +{
> + int virq;
> +
> + /* Create IMSIC IPI multiplexing */
> + virq = ipi_mux_create(IMSIC_NR_IPI, imsic_ipi_send);
> + if (virq <= 0)
> + return (virq < 0) ? virq : -ENOMEM;
> + imsic->ipi_virq = virq;
> +
> + /* First vIRQ is used for IMSIC ID synchronization */
> + virq = request_percpu_irq(imsic->ipi_virq, imsic_local_sync_handler,
> + "riscv-imsic-lsync", imsic->global.local);
> + if (virq)
> + return virq;
Please use a separate 'ret' variable. I had to read this 3 times to make
sense of it.
> + irq_set_status_flags(imsic->ipi_virq, IRQ_HIDDEN);
> + imsic->ipi_lsync_desc = irq_to_desc(imsic->ipi_virq);
What's so special about this particular IPI that it can't be handled
like all the other IPIs?
> +static int __init imsic_early_probe(struct fwnode_handle *fwnode)
> +{
> + int rc;
> + struct irq_domain *domain;
https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#variable-declarations
> +
> + /* Find parent domain and register chained handler */
> + domain = irq_find_matching_fwnode(riscv_get_intc_hwnode(),
> + DOMAIN_BUS_ANY);
> + if (!domain) {
> + pr_err("%pfwP: Failed to find INTC domain\n", fwnode);
> + return -ENOENT;
> + }
> + imsic_parent_irq = irq_create_mapping(domain, RV_IRQ_EXT);
> + if (!imsic_parent_irq) {
> + pr_err("%pfwP: Failed to create INTC mapping\n", fwnode);
> + return -ENOENT;
> + }
> + irq_set_chained_handler(imsic_parent_irq, imsic_handle_irq);
> +
> + /* Initialize IPI domain */
> + rc = imsic_ipi_domain_init();
> + if (rc) {
> + pr_err("%pfwP: Failed to initialize IPI domain\n", fwnode);
> + return rc;
Leaves the chained handler around and enabled.
> diff --git a/drivers/irqchip/irq-riscv-imsic-state.c b/drivers/irqchip/irq-riscv-imsic-state.c
> +
> +#define imsic_csr_write(__c, __v) \
> +do { \
> + csr_write(CSR_ISELECT, __c); \
> + csr_write(CSR_IREG, __v); \
> +} while (0)
Any reason why these macros can't be inlines?
> +const struct imsic_global_config *imsic_get_global_config(void)
> +{
> + return imsic ? &imsic->global : NULL;
> +}
> +EXPORT_SYMBOL_GPL(imsic_get_global_config);
Why is this exported?
> +#define __imsic_id_read_clear_enabled(__id) \
> + __imsic_eix_read_clear((__id), false)
> +#define __imsic_id_read_clear_pending(__id) \
> + __imsic_eix_read_clear((__id), true)
Please use inlines.
> +void __imsic_eix_update(unsigned long base_id,
> + unsigned long num_id, bool pend, bool val)
> +{
> + unsigned long i, isel, ireg;
> + unsigned long id = base_id, last_id = base_id + num_id;
> +
> + while (id < last_id) {
> + isel = id / BITS_PER_LONG;
> + isel *= BITS_PER_LONG / IMSIC_EIPx_BITS;
> + isel += (pend) ? IMSIC_EIP0 : IMSIC_EIE0;
> +
> + ireg = 0;
> + for (i = id & (__riscv_xlen - 1);
> + (id < last_id) && (i < __riscv_xlen); i++) {
> + ireg |= BIT(i);
> + id++;
> + }
This lacks a comment what this is doing.
> +
> + /*
> + * The IMSIC EIEx and EIPx registers are indirectly
> + * accessed via using ISELECT and IREG CSRs so we
> + * need to access these CSRs without getting preempted.
> + *
> + * All existing users of this function call this
> + * function with local IRQs disabled so we don't
> + * need to do anything special here.
> + */
> + if (val)
> + imsic_csr_set(isel, ireg);
> + else
> + imsic_csr_clear(isel, ireg);
> + }
> +}
> +
> +void imsic_local_sync(void)
> +{
> + struct imsic_local_priv *lpriv = this_cpu_ptr(imsic->lpriv);
> + struct imsic_local_config *mlocal;
> + struct imsic_vector *mvec;
> + unsigned long flags;
> + int i;
> +
> + raw_spin_lock_irqsave(&lpriv->ids_lock, flags);
> + for (i = 1; i <= imsic->global.nr_ids; i++) {
> + if (i == IMSIC_IPI_ID)
> + continue;
> +
> + if (test_bit(i, lpriv->ids_enabled_bitmap))
> + __imsic_id_set_enable(i);
> + else
> + __imsic_id_clear_enable(i);
> +
> + mvec = lpriv->ids_move[i];
> + lpriv->ids_move[i] = NULL;
> + if (mvec) {
> + if (__imsic_id_read_clear_pending(i)) {
> + mlocal = per_cpu_ptr(imsic->global.local,
> + mvec->cpu);
> + writel_relaxed(mvec->local_id, mlocal->msi_va);
> + }
> +
> + imsic_vector_free(&lpriv->vectors[i]);
> + }
Again an uncommented piece of magic which you will have forgotten what
it does 3 month down the road :)
> +
> + }
> + raw_spin_unlock_irqrestore(&lpriv->ids_lock, flags);
> +}
> +
> +void imsic_local_delivery(bool enable)
> +{
> + if (enable) {
> + imsic_csr_write(IMSIC_EITHRESHOLD, IMSIC_ENABLE_EITHRESHOLD);
> + imsic_csr_write(IMSIC_EIDELIVERY, IMSIC_ENABLE_EIDELIVERY);
> + return;
> + }
> +
> + imsic_csr_write(IMSIC_EIDELIVERY, IMSIC_DISABLE_EIDELIVERY);
> + imsic_csr_write(IMSIC_EITHRESHOLD, IMSIC_DISABLE_EITHRESHOLD);
> +}
> +
> +#ifdef CONFIG_SMP
> +static void imsic_remote_sync(unsigned int cpu)
> +{
> + /*
> + * We simply inject ID synchronization IPI to a target CPU
> + * if it is not same as the current CPU. The ipi_send_mask()
> + * implementation of IPI mux will inject ID synchronization
> + * IPI only for CPUs that have enabled it so offline CPUs
> + * won't receive IPI. An offline CPU will unconditionally
> + * synchronize IDs through imsic_starting_cpu() when the
> + * CPU is brought up.
> + */
> + if (cpu_online(cpu)) {
> + if (cpu != smp_processor_id())
> + __ipi_send_mask(imsic->ipi_lsync_desc, cpumask_of(cpu));
Still wondering why this can't use the regular API. There might be a
reason, but then it wants to be documented.
> + else
> + imsic_local_sync();
> + }
> +}
> +#else
> +static inline void imsic_remote_sync(unsigned int cpu)
> +{
> + imsic_local_sync();
> +}
> +#endif
> +
> +void imsic_vector_mask(struct imsic_vector *vec)
> +{
> + struct imsic_local_priv *lpriv;
> + unsigned long flags;
> +
> + lpriv = per_cpu_ptr(imsic->lpriv, vec->cpu);
> + if (WARN_ON(&lpriv->vectors[vec->local_id] != vec))
> + return;
> +
> + raw_spin_lock_irqsave(&lpriv->ids_lock, flags);
AFAICT, this is used from an irqchip callback:
static void imsic_irq_mask(struct irq_data *d)
{
imsic_vector_mask(irq_data_get_irq_chip_data(d));
}
So no need to use irqsave() here. Those callbacks run always with
interrupts disabled when called from the core.
> +void imsic_vector_move(struct imsic_vector *old_vec,
> + struct imsic_vector *new_vec)
> +{
> + struct imsic_local_priv *old_lpriv, *new_lpriv;
> + unsigned long flags, flags1;
> +
> + if (WARN_ON(old_vec->cpu == new_vec->cpu))
> + return;
> +
> + old_lpriv = per_cpu_ptr(imsic->lpriv, old_vec->cpu);
> + if (WARN_ON(&old_lpriv->vectors[old_vec->local_id] != old_vec))
> + return;
> +
> + new_lpriv = per_cpu_ptr(imsic->lpriv, new_vec->cpu);
> + if (WARN_ON(&new_lpriv->vectors[new_vec->local_id] != new_vec))
> + return;
> +
> + raw_spin_lock_irqsave(&old_lpriv->ids_lock, flags);
> + raw_spin_lock_irqsave(&new_lpriv->ids_lock, flags1);
Lockdep should yell at you for this, rightfully so. And not only because
of the missing nested() annotation.
Assume there are two CPUs setting affinity for two different interrupts.
CPU0 moves an interrupt to CPU1 and CPU1 moves another interrupt to
CPU0. The resulting lock order is:
CPU0 CPU1
lock(lpriv[CPU0]); lock(lpriv[CPU1]);
lock(lpriv[CPU1]); lock(lpriv[CPU0]);
a classic ABBA deadlock.
You need to take those locks always in the same order. Look at
double_raw_lock() in kernel/sched/sched.h.
> + /* Unmask the new vector entry */
> + if (test_bit(old_vec->local_id, old_lpriv->ids_enabled_bitmap))
> + bitmap_set(new_lpriv->ids_enabled_bitmap,
> + new_vec->local_id, 1);
Either make that one line or please add brackets. See:
https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#bracket-rules
> +static int __init imsic_local_init(void)
> +{
> + struct imsic_global_config *global = &imsic->global;
> + struct imsic_local_priv *lpriv;
> + struct imsic_vector *vec;
> + int cpu, i;
> +
> + /* Allocate per-CPU private state */
> + imsic->lpriv = alloc_percpu(typeof(*(imsic->lpriv)));
> + if (!imsic->lpriv)
> + return -ENOMEM;
> +
> + /* Setup per-CPU private state */
> + for_each_possible_cpu(cpu) {
> + lpriv = per_cpu_ptr(imsic->lpriv, cpu);
> +
> + raw_spin_lock_init(&lpriv->ids_lock);
> +
> + /* Allocate enabled bitmap */
> + lpriv->ids_enabled_bitmap = bitmap_zalloc(global->nr_ids + 1,
> + GFP_KERNEL);
> + if (!lpriv->ids_enabled_bitmap) {
> + imsic_local_cleanup();
> + return -ENOMEM;
> + }
> +
> + /* Allocate move array */
> + lpriv->ids_move = kcalloc(global->nr_ids + 1,
> + sizeof(*lpriv->ids_move), GFP_KERNEL);
> + if (!lpriv->ids_move) {
> + imsic_local_cleanup();
> + return -ENOMEM;
> + }
> +
> + /* Allocate vector array */
> + lpriv->vectors = kcalloc(global->nr_ids + 1,
> + sizeof(*lpriv->vectors), GFP_KERNEL);
> + if (!lpriv->vectors) {
> + imsic_local_cleanup();
> + return -ENOMEM;
Third instance of the same pattern. goto cleanup; perhaps?
> +struct imsic_vector *imsic_vector_alloc(unsigned int hwirq,
> + const struct cpumask *mask)
> +{
> + struct imsic_vector *vec = NULL;
> + struct imsic_local_priv *lpriv;
> + unsigned long flags;
> + unsigned int cpu;
> + int local_id;
> +
> + raw_spin_lock_irqsave(&imsic->matrix_lock, flags);
> + local_id = irq_matrix_alloc(imsic->matrix, mask, false, &cpu);
> + raw_spin_unlock_irqrestore(&imsic->matrix_lock, flags);
> + if (local_id < 0)
> + return NULL;
> +
> + lpriv = per_cpu_ptr(imsic->lpriv, cpu);
> + vec = &lpriv->vectors[local_id];
> + vec->hwirq = hwirq;
> +
> + return vec;
> +}
..
> +int imsic_hwirq_alloc(void)
> +{
> + int ret;
> + unsigned long flags;
> +
> + raw_spin_lock_irqsave(&imsic->hwirqs_lock, flags);
> + ret = bitmap_find_free_region(imsic->hwirqs_used_bitmap,
> + imsic->nr_hwirqs, 0);
> + raw_spin_unlock_irqrestore(&imsic->hwirqs_lock, flags);
> +
> + return ret;
> +}
This part is just to create a unique hwirq number, right?
> +
> + /* Find number of guest index bits in MSI address */
> + rc = of_property_read_u32(to_of_node(fwnode),
> + "riscv,guest-index-bits",
> + &global->guest_index_bits);
> + if (rc)
> + global->guest_index_bits = 0;
So here you get the index bits, but then 50 lines further down you do
sanity checking. Wouldn't it make sense to do that right here?
Same for the other bits.
> +
> +/*
> + * The IMSIC driver uses 1 IPI for ID synchronization and
> + * arch/riscv/kernel/smp.c require 6 IPIs so we fix the
> + * total number of IPIs to 8.
> + */
> +#define IMSIC_IPI_ID 1
> +#define IMSIC_NR_IPI 8
> +
> +struct imsic_vector {
> + /* Fixed details of the vector */
> + unsigned int cpu;
> + unsigned int local_id;
> + /* Details saved by driver in the vector */
> + unsigned int hwirq;
> +};
> +
> +struct imsic_local_priv {
> + /* Local state of interrupt identities */
> + raw_spinlock_t ids_lock;
> + unsigned long *ids_enabled_bitmap;
> + struct imsic_vector **ids_move;
> +
> + /* Local vector table */
> + struct imsic_vector *vectors;
Please make those structs tabular:
https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#struct-declarations-and-initializers
> +void __imsic_eix_update(unsigned long base_id,
> + unsigned long num_id, bool pend, bool val);
> +
> +#define __imsic_id_set_enable(__id) \
> + __imsic_eix_update((__id), 1, false, true)
> +#define __imsic_id_clear_enable(__id) \
> + __imsic_eix_update((__id), 1, false, false)
inlines please.
Thanks,
tglx
On Sat, Jan 27 2024 at 21:47, Anup Patel wrote:
> +static int imsic_cpu_page_phys(unsigned int cpu,
> + unsigned int guest_index,
> + phys_addr_t *out_msi_pa)
> +{
> + struct imsic_global_config *global;
> + struct imsic_local_config *local;
> +
> + global = &imsic->global;
> + local = per_cpu_ptr(global->local, cpu);
> +
> + if (BIT(global->guest_index_bits) <= guest_index)
> + return -EINVAL;
As the callsite does not care about the return value, just make this
function boolean and return true on success.
> + if (out_msi_pa)
> + *out_msi_pa = local->msi_pa +
> + (guest_index * IMSIC_MMIO_PAGE_SZ);
> +
> + return 0;
> +}
> +
> +static void imsic_irq_mask(struct irq_data *d)
> +{
> + imsic_vector_mask(irq_data_get_irq_chip_data(d));
> +}
> +
> +static void imsic_irq_unmask(struct irq_data *d)
> +{
> + imsic_vector_unmask(irq_data_get_irq_chip_data(d));
> +}
> +
> +static int imsic_irq_retrigger(struct irq_data *d)
> +{
> + struct imsic_vector *vec = irq_data_get_irq_chip_data(d);
> + struct imsic_local_config *local;
> +
> + if (WARN_ON(vec == NULL))
> + return -ENOENT;
> +
> + local = per_cpu_ptr(imsic->global.local, vec->cpu);
> + writel(vec->local_id, local->msi_va);
> + return 0;
> +}
> +
> +static void imsic_irq_compose_vector_msg(struct imsic_vector *vec,
> + struct msi_msg *msg)
> +{
> + phys_addr_t msi_addr;
> + int err;
> +
> + if (WARN_ON(vec == NULL))
> + return;
> +
> + err = imsic_cpu_page_phys(vec->cpu, 0, &msi_addr);
> + if (WARN_ON(err))
> + return;
if (WARN_ON(!imsic_cpu_page_phys(...)))
return
Hmm?
> +
> + msg->address_hi = upper_32_bits(msi_addr);
> + msg->address_lo = lower_32_bits(msi_addr);
> + msg->data = vec->local_id;
> +}
> +
> +static void imsic_irq_compose_msg(struct irq_data *d, struct msi_msg *msg)
> +{
> + imsic_irq_compose_vector_msg(irq_data_get_irq_chip_data(d), msg);
> +}
> +
> +#ifdef CONFIG_SMP
> +static void imsic_msi_update_msg(struct irq_data *d, struct imsic_vector *vec)
> +{
> + struct msi_msg msg[2] = { [1] = { }, };
> +
> + imsic_irq_compose_vector_msg(vec, msg);
> + irq_data_get_irq_chip(d)->irq_write_msi_msg(d, msg);
> +}
> +
> +static int imsic_irq_set_affinity(struct irq_data *d,
> + const struct cpumask *mask_val,
> + bool force)
> +{
> + struct imsic_vector *old_vec, *new_vec;
> + struct irq_data *pd = d->parent_data;
> +
> + old_vec = irq_data_get_irq_chip_data(pd);
> + if (WARN_ON(old_vec == NULL))
> + return -ENOENT;
> +
> + /* Get a new vector on the desired set of CPUs */
> + new_vec = imsic_vector_alloc(old_vec->hwirq, mask_val);
> + if (!new_vec)
> + return -ENOSPC;
> +
> + /* If old vector belongs to the desired CPU then do nothing */
> + if (old_vec->cpu == new_vec->cpu) {
> + imsic_vector_free(new_vec);
> + return IRQ_SET_MASK_OK_DONE;
> + }
You can spare that exercise by checking it before the allocation:
if (cpumask_test_cpu(old_vec->cpu, mask_val))
return IRQ_SET_MASK_OK_DONE;
> +
> + /* Point device to the new vector */
> + imsic_msi_update_msg(d, new_vec);
> +static int imsic_irq_domain_alloc(struct irq_domain *domain,
> + unsigned int virq, unsigned int nr_irqs,
> + void *args)
> +{
> + struct imsic_vector *vec;
> + int hwirq;
> +
> + /* Legacy-MSI or multi-MSI not supported yet. */
What's legacy MSI in that context?
> + if (nr_irqs > 1)
> + return -ENOTSUPP;
> +
> + hwirq = imsic_hwirq_alloc();
> + if (hwirq < 0)
> + return hwirq;
> +
> + vec = imsic_vector_alloc(hwirq, cpu_online_mask);
> + if (!vec) {
> + imsic_hwirq_free(hwirq);
> + return -ENOSPC;
> + }
> +
> + irq_domain_set_info(domain, virq, hwirq,
> + &imsic_irq_base_chip, vec,
> + handle_simple_irq, NULL, NULL);
> + irq_set_noprobe(virq);
> + irq_set_affinity(virq, cpu_online_mask);
> +
> + /*
> + * IMSIC does not implement irq_disable() so Linux interrupt
> + * subsystem will take a lazy approach for disabling an IMSIC
> + * interrupt. This means IMSIC interrupts are left unmasked
> + * upon system suspend and interrupts are not processed
> + * immediately upon system wake up. To tackle this, we disable
> + * the lazy approach for all IMSIC interrupts.
Why? Lazy works perfectly fine even w/o an irq_disable() callback.
> + */
> + irq_set_status_flags(virq, IRQ_DISABLE_UNLAZY);
> +
> +#define MATCH_PLATFORM_MSI BIT(DOMAIN_BUS_PLATFORM_MSI)
You really love macro indirections :)
> +static const struct msi_parent_ops imsic_msi_parent_ops = {
> + .supported_flags = MSI_GENERIC_FLAGS_MASK,
> + .required_flags = MSI_FLAG_USE_DEF_DOM_OPS |
> + MSI_FLAG_USE_DEF_CHIP_OPS,
> + .bus_select_token = DOMAIN_BUS_NEXUS,
> + .bus_select_mask = MATCH_PLATFORM_MSI,
> + .init_dev_msi_info = imsic_init_dev_msi_info,
> +};
> +
> +int imsic_irqdomain_init(void)
> +{
> + struct imsic_global_config *global;
> +
> + if (!imsic || !imsic->fwnode) {
> + pr_err("early driver not probed\n");
> + return -ENODEV;
> + }
> +
> + if (imsic->base_domain) {
> + pr_err("%pfwP: irq domain already created\n", imsic->fwnode);
> + return -ENODEV;
> + }
> +
> + global = &imsic->global;
Please move that assignment down to the usage site. Here it's just a
distraction.
> + /* Create Base IRQ domain */
> + imsic->base_domain = irq_domain_create_tree(imsic->fwnode,
> + &imsic_base_domain_ops, imsic);
> + if (!imsic->base_domain) {
> + pr_err("%pfwP: failed to create IMSIC base domain\n",
> + imsic->fwnode);
> + return -ENOMEM;
> + }
> + imsic->base_domain->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
> + imsic->base_domain->msi_parent_ops = &imsic_msi_parent_ops;
Thanks,
tglx
On Sat, Jan 27 2024 at 21:47, Anup Patel wrote:
> +#ifdef CONFIG_RISCV_IMSIC_PCI
> +
> +static void imsic_pci_mask_irq(struct irq_data *d)
> +{
> + pci_msi_mask_irq(d);
> + irq_chip_mask_parent(d);
> +}
> +
> +static void imsic_pci_unmask_irq(struct irq_data *d)
> +{
> + pci_msi_unmask_irq(d);
> + irq_chip_unmask_parent(d);
That's asymmetric vs. mask().
Thanks,
tglx
On Fri, Feb 16 2024 at 22:41, Anup Patel wrote:
> On Fri, Feb 16, 2024 at 9:03 PM Thomas Gleixner <[email protected]> wrote:
>> I don't think that removing the setup protection is correct.
>>
>> Assume you have maxcpus=N on the kernel command line, then the above
>> for_each_online_cpu() loop would result in cpuhp_setup == true when the
>> instances for the not onlined CPUs are set up, no?
>
> A platform can have multiple PLIC instances where each PLIC
> instance targets a subset of HARTs (or CPUs).
>
> Previously (before this patch), we were probing PLIC very early so on
> a platform with multiple PLIC instances, we need to ensure that cpuhp
> setup is done only after PLIC context associated with boot CPU is
> initialized hence the plic_cpuhp_setup_done check.
>
> This patch converts PLIC driver into a platform driver so now PLIC
> instances are probed after all available CPUs are brought-up. In this
> case, the cpuhp setup must be done only after PLIC context of all
> available CPUs are initialized otherwise some of the CPUs crash
> in plic_starting_cpu() due to lack of PLIC context initialization.
You're missing the point.
Assume you have 8 CPUs and 2 PLIC instances one for CPU0-3 and one for
CPU4-7.
Add "maxcpus=4" on the kernel command line, then only the first 4 CPUs
are brought up.
So at probe time cpu_online_mask has bit 0,1,2,3 set.
When the first PLIC it probed the loop which checks the context for each
online CPU will not clear cpuhp_setup and the hotplug state is installed.
Now the second PLIC is probed (the one for the offline CPUs 4-7) and the
loop will again not clear cpuhp_setup and it tries to install the state
again, no?
Thanks,
tglx
On Sat, Jan 27 2024 at 21:47, Anup Patel wrote:
> We extend the existing APLIC irqchip driver to support MSI-mode for
> RISC-V platforms having both wired interrupts and MSIs.
We? Just s/We//
> +
> +static void aplic_msi_irq_unmask(struct irq_data *d)
> +{
> + aplic_irq_unmask(d);
> + irq_chip_unmask_parent(d);
> +}
> +
> +static void aplic_msi_irq_mask(struct irq_data *d)
> +{
> + aplic_irq_mask(d);
> + irq_chip_mask_parent(d);
> +}
Again asymmetric vs. unmask()
> +static void aplic_msi_irq_eoi(struct irq_data *d)
> +{
> + struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
> + u32 reg_off, reg_mask;
> +
> + /*
> + * EOI handling only required only for level-triggered
> + * interrupts in APLIC MSI mode.
> + */
> +
> + reg_off = APLIC_CLRIP_BASE + ((d->hwirq / APLIC_IRQBITS_PER_REG) * 4);
> + reg_mask = BIT(d->hwirq % APLIC_IRQBITS_PER_REG);
> + switch (irqd_get_trigger_type(d)) {
> + case IRQ_TYPE_LEVEL_LOW:
> + if (!(readl(priv->regs + reg_off) & reg_mask))
> + writel(d->hwirq, priv->regs + APLIC_SETIPNUM_LE);
A comment what this condition is for would be nice.
Thanks,
tglx
On Sat, Jan 27 2024 at 21:47, Anup Patel wrote:
> +static int aplic_direct_irqdomain_translate(struct irq_domain *d,
> + struct irq_fwspec *fwspec,
> + unsigned long *hwirq,
> + unsigned int *type)
Please align the arguments to the first argument of the first line and
use the 100 characters, i.e.
static int aplic_direct_irqdomain_translate(struct irq_domain *d, struct irq_fwspec *fwspec,
unsigned long *hwirq, unsigned int *type)
{
All over the place.
> +{
> + struct aplic_priv *priv = d->host_data;
> +
> + return aplic_irqdomain_translate(fwspec, priv->gsi_base,
> + hwirq, type);
> +}
> +
> +static int aplic_direct_irqdomain_alloc(struct irq_domain *domain,
> + unsigned int virq, unsigned int nr_irqs,
> + void *arg)
> +{
> + int i, ret;
> + unsigned int type;
> + irq_hw_number_t hwirq;
> + struct irq_fwspec *fwspec = arg;
> + struct aplic_priv *priv = domain->host_data;
> + struct aplic_direct *direct =
> + container_of(priv, struct aplic_direct, priv);
Variable ordering. Please make this consistent according to documentation.
> + ret = aplic_irqdomain_translate(fwspec, priv->gsi_base,
> + &hwirq, &type);
> + if (ret)
> + return ret;
> +
> + for (i = 0; i < nr_irqs; i++) {
> + irq_domain_set_info(domain, virq + i, hwirq + i,
> + &aplic_direct_chip, priv,
> + handle_fasteoi_irq, NULL, NULL);
> + irq_set_affinity(virq + i, &direct->lmask);
> + /* See the reason described in aplic_msi_irqdomain_alloc() */
I still have to understand that "reason". :)
> + irq_set_status_flags(virq + i, IRQ_DISABLE_UNLAZY);
> + }
Thanks,
tglx
Anup!
On Thu, Feb 15 2024 at 20:59, Thomas Gleixner wrote:
> I'm going over the rest of the series after I dealt with my other patch
> backlog.
Aside of the nitpicks I had, this looks pretty reasonable.
Thanks,
tglx
On Sat, Feb 17, 2024 at 1:52 AM Thomas Gleixner <[email protected]> wrote:
>
> On Fri, Feb 16 2024 at 22:41, Anup Patel wrote:
> > On Fri, Feb 16, 2024 at 9:03 PM Thomas Gleixner <[email protected]> wrote:
> >> I don't think that removing the setup protection is correct.
> >>
> >> Assume you have maxcpus=N on the kernel command line, then the above
> >> for_each_online_cpu() loop would result in cpuhp_setup == true when the
> >> instances for the not onlined CPUs are set up, no?
> >
> > A platform can have multiple PLIC instances where each PLIC
> > instance targets a subset of HARTs (or CPUs).
> >
> > Previously (before this patch), we were probing PLIC very early so on
> > a platform with multiple PLIC instances, we need to ensure that cpuhp
> > setup is done only after PLIC context associated with boot CPU is
> > initialized hence the plic_cpuhp_setup_done check.
> >
> > This patch converts PLIC driver into a platform driver so now PLIC
> > instances are probed after all available CPUs are brought-up. In this
> > case, the cpuhp setup must be done only after PLIC context of all
> > available CPUs are initialized otherwise some of the CPUs crash
> > in plic_starting_cpu() due to lack of PLIC context initialization.
>
> You're missing the point.
>
> Assume you have 8 CPUs and 2 PLIC instances one for CPU0-3 and one for
> CPU4-7.
>
> Add "maxcpus=4" on the kernel command line, then only the first 4 CPUs
> are brought up.
>
> So at probe time cpu_online_mask has bit 0,1,2,3 set.
>
> When the first PLIC it probed the loop which checks the context for each
> online CPU will not clear cpuhp_setup and the hotplug state is installed.
>
> Now the second PLIC is probed (the one for the offline CPUs 4-7) and the
> loop will again not clear cpuhp_setup and it tries to install the state
> again, no?
Ahh, yes. Good catch.
For the "maxcpus" in kernel command-line, we can't rely on the
cpu_online_mask. I will preserve the plic_cpuhp_setup_done
check in the next revision.
Regards,
Anup
On Sat, Feb 17, 2024 at 12:10 AM Thomas Gleixner <[email protected]> wrote:
>
> On Sat, Jan 27 2024 at 21:47, Anup Patel wrote:
> > +
> > +#ifdef CONFIG_SMP
> > +static irqreturn_t imsic_local_sync_handler(int irq, void *data)
> > +{
> > + imsic_local_sync();
> > + return IRQ_HANDLED;
> > +}
> > +
> > +static void imsic_ipi_send(unsigned int cpu)
> > +{
> > + struct imsic_local_config *local =
> > + per_cpu_ptr(imsic->global.local, cpu);
>
> Let it stick out. We switched to line length 100 quite some time
> ago. Applies to the rest of the series too.
Okay, I will update.
>
> > + writel_relaxed(IMSIC_IPI_ID, local->msi_va);
> > +}
> > +
> > +static void imsic_ipi_starting_cpu(void)
> > +{
> > + /* Enable IPIs for current CPU. */
> > + __imsic_id_set_enable(IMSIC_IPI_ID);
> > +
> > + /* Enable virtual IPI used for IMSIC ID synchronization */
> > + enable_percpu_irq(imsic->ipi_virq, 0);
> > +}
> > +
> > +static void imsic_ipi_dying_cpu(void)
> > +{
> > + /*
> > + * Disable virtual IPI used for IMSIC ID synchronization so
> > + * that we don't receive ID synchronization requests.
> > + */
> > + disable_percpu_irq(imsic->ipi_virq);
>
> Shouldn't this disable the hardware too, i.e.
>
> __imsic_id_clear_enable()
>
> ?
Yes, it should but somehow I missed and never saw any issue.
I will update.
>
> > +}
> > +
> > +static int __init imsic_ipi_domain_init(void)
> > +{
> > + int virq;
> > +
> > + /* Create IMSIC IPI multiplexing */
> > + virq = ipi_mux_create(IMSIC_NR_IPI, imsic_ipi_send);
> > + if (virq <= 0)
> > + return (virq < 0) ? virq : -ENOMEM;
> > + imsic->ipi_virq = virq;
> > +
> > + /* First vIRQ is used for IMSIC ID synchronization */
> > + virq = request_percpu_irq(imsic->ipi_virq, imsic_local_sync_handler,
> > + "riscv-imsic-lsync", imsic->global.local);
> > + if (virq)
> > + return virq;
>
> Please use a separate 'ret' variable. I had to read this 3 times to make
> sense of it.
Okay, I will update.
>
> > + irq_set_status_flags(imsic->ipi_virq, IRQ_HIDDEN);
> > + imsic->ipi_lsync_desc = irq_to_desc(imsic->ipi_virq);
>
> What's so special about this particular IPI that it can't be handled
> like all the other IPIs?
We are using this special under-the-hood IPI for synchronization
of IRQ enable/disable and IRQ movement across CPUs.
x86 has a more lazy approach of using a per-CPU timer so in
the next revision I will move to a similar approach. This means
both "ipi_virq" and "ipi_lsync_desc" will go away.
>
> > +static int __init imsic_early_probe(struct fwnode_handle *fwnode)
> > +{
> > + int rc;
> > + struct irq_domain *domain;
>
> https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#variable-declarations
Okay, I will update.
>
> > +
> > + /* Find parent domain and register chained handler */
> > + domain = irq_find_matching_fwnode(riscv_get_intc_hwnode(),
> > + DOMAIN_BUS_ANY);
> > + if (!domain) {
> > + pr_err("%pfwP: Failed to find INTC domain\n", fwnode);
> > + return -ENOENT;
> > + }
> > + imsic_parent_irq = irq_create_mapping(domain, RV_IRQ_EXT);
> > + if (!imsic_parent_irq) {
> > + pr_err("%pfwP: Failed to create INTC mapping\n", fwnode);
> > + return -ENOENT;
> > + }
> > + irq_set_chained_handler(imsic_parent_irq, imsic_handle_irq);
> > +
> > + /* Initialize IPI domain */
> > + rc = imsic_ipi_domain_init();
> > + if (rc) {
> > + pr_err("%pfwP: Failed to initialize IPI domain\n", fwnode);
> > + return rc;
>
> Leaves the chained handler around and enabled.
Okay, I will set the chained hander after imsic_ipi_domain_init().
>
> > diff --git a/drivers/irqchip/irq-riscv-imsic-state.c b/drivers/irqchip/irq-riscv-imsic-state.c
> > +
> > +#define imsic_csr_write(__c, __v) \
> > +do { \
> > + csr_write(CSR_ISELECT, __c); \
> > + csr_write(CSR_IREG, __v); \
> > +} while (0)
>
> Any reason why these macros can't be inlines?
No particular reason. I am fine with both maros and inline functions.
I will update in the next revision.
>
> > +const struct imsic_global_config *imsic_get_global_config(void)
> > +{
> > + return imsic ? &imsic->global : NULL;
> > +}
> > +EXPORT_SYMBOL_GPL(imsic_get_global_config);
>
> Why is this exported?
This is for the KVM RISC-V module. We have follow up
KVM RISC-V patchs which need to know the IMSIC global
configuration so that it can assign IMSIC guest files to a
Guest/VM.
>
> > +#define __imsic_id_read_clear_enabled(__id) \
> > + __imsic_eix_read_clear((__id), false)
> > +#define __imsic_id_read_clear_pending(__id) \
> > + __imsic_eix_read_clear((__id), true)
>
> Please use inlines.
Okay, I will update.
>
> > +void __imsic_eix_update(unsigned long base_id,
> > + unsigned long num_id, bool pend, bool val)
> > +{
> > + unsigned long i, isel, ireg;
> > + unsigned long id = base_id, last_id = base_id + num_id;
> > +
> > + while (id < last_id) {
> > + isel = id / BITS_PER_LONG;
> > + isel *= BITS_PER_LONG / IMSIC_EIPx_BITS;
> > + isel += (pend) ? IMSIC_EIP0 : IMSIC_EIE0;
> > +
> > + ireg = 0;
> > + for (i = id & (__riscv_xlen - 1);
> > + (id < last_id) && (i < __riscv_xlen); i++) {
> > + ireg |= BIT(i);
> > + id++;
> > + }
>
> This lacks a comment what this is doing.
Okay, I will add a comment block.
>
> > +
> > + /*
> > + * The IMSIC EIEx and EIPx registers are indirectly
> > + * accessed via using ISELECT and IREG CSRs so we
> > + * need to access these CSRs without getting preempted.
> > + *
> > + * All existing users of this function call this
> > + * function with local IRQs disabled so we don't
> > + * need to do anything special here.
> > + */
> > + if (val)
> > + imsic_csr_set(isel, ireg);
> > + else
> > + imsic_csr_clear(isel, ireg);
> > + }
> > +}
> > +
> > +void imsic_local_sync(void)
> > +{
> > + struct imsic_local_priv *lpriv = this_cpu_ptr(imsic->lpriv);
> > + struct imsic_local_config *mlocal;
> > + struct imsic_vector *mvec;
> > + unsigned long flags;
> > + int i;
> > +
> > + raw_spin_lock_irqsave(&lpriv->ids_lock, flags);
> > + for (i = 1; i <= imsic->global.nr_ids; i++) {
> > + if (i == IMSIC_IPI_ID)
> > + continue;
> > +
> > + if (test_bit(i, lpriv->ids_enabled_bitmap))
> > + __imsic_id_set_enable(i);
> > + else
> > + __imsic_id_clear_enable(i);
> > +
> > + mvec = lpriv->ids_move[i];
> > + lpriv->ids_move[i] = NULL;
> > + if (mvec) {
> > + if (__imsic_id_read_clear_pending(i)) {
> > + mlocal = per_cpu_ptr(imsic->global.local,
> > + mvec->cpu);
> > + writel_relaxed(mvec->local_id, mlocal->msi_va);
> > + }
> > +
> > + imsic_vector_free(&lpriv->vectors[i]);
> > + }
>
> Again an uncommented piece of magic which you will have forgotten what
> it does 3 month down the road :)
Sure, I will add a comment block.
>
> > +
> > + }
> > + raw_spin_unlock_irqrestore(&lpriv->ids_lock, flags);
> > +}
> > +
> > +void imsic_local_delivery(bool enable)
> > +{
> > + if (enable) {
> > + imsic_csr_write(IMSIC_EITHRESHOLD, IMSIC_ENABLE_EITHRESHOLD);
> > + imsic_csr_write(IMSIC_EIDELIVERY, IMSIC_ENABLE_EIDELIVERY);
> > + return;
> > + }
> > +
> > + imsic_csr_write(IMSIC_EIDELIVERY, IMSIC_DISABLE_EIDELIVERY);
> > + imsic_csr_write(IMSIC_EITHRESHOLD, IMSIC_DISABLE_EITHRESHOLD);
> > +}
> > +
> > +#ifdef CONFIG_SMP
> > +static void imsic_remote_sync(unsigned int cpu)
> > +{
> > + /*
> > + * We simply inject ID synchronization IPI to a target CPU
> > + * if it is not same as the current CPU. The ipi_send_mask()
> > + * implementation of IPI mux will inject ID synchronization
> > + * IPI only for CPUs that have enabled it so offline CPUs
> > + * won't receive IPI. An offline CPU will unconditionally
> > + * synchronize IDs through imsic_starting_cpu() when the
> > + * CPU is brought up.
> > + */
> > + if (cpu_online(cpu)) {
> > + if (cpu != smp_processor_id())
> > + __ipi_send_mask(imsic->ipi_lsync_desc, cpumask_of(cpu));
>
> Still wondering why this can't use the regular API. There might be a
> reason, but then it wants to be documented.
As mentioned above, the "ipi_virq" and "irq_lsync_desc" will
be replaced by a per-CPU timer in the next revision.
>
> > + else
> > + imsic_local_sync();
> > + }
> > +}
> > +#else
> > +static inline void imsic_remote_sync(unsigned int cpu)
> > +{
> > + imsic_local_sync();
> > +}
> > +#endif
> > +
> > +void imsic_vector_mask(struct imsic_vector *vec)
> > +{
> > + struct imsic_local_priv *lpriv;
> > + unsigned long flags;
> > +
> > + lpriv = per_cpu_ptr(imsic->lpriv, vec->cpu);
> > + if (WARN_ON(&lpriv->vectors[vec->local_id] != vec))
> > + return;
> > +
> > + raw_spin_lock_irqsave(&lpriv->ids_lock, flags);
>
> AFAICT, this is used from an irqchip callback:
>
> static void imsic_irq_mask(struct irq_data *d)
> {
> imsic_vector_mask(irq_data_get_irq_chip_data(d));
> }
>
> So no need to use irqsave() here. Those callbacks run always with
> interrupts disabled when called from the core.
Okay, I will update.
>
> > +void imsic_vector_move(struct imsic_vector *old_vec,
> > + struct imsic_vector *new_vec)
> > +{
> > + struct imsic_local_priv *old_lpriv, *new_lpriv;
> > + unsigned long flags, flags1;
> > +
> > + if (WARN_ON(old_vec->cpu == new_vec->cpu))
> > + return;
> > +
> > + old_lpriv = per_cpu_ptr(imsic->lpriv, old_vec->cpu);
> > + if (WARN_ON(&old_lpriv->vectors[old_vec->local_id] != old_vec))
> > + return;
> > +
> > + new_lpriv = per_cpu_ptr(imsic->lpriv, new_vec->cpu);
> > + if (WARN_ON(&new_lpriv->vectors[new_vec->local_id] != new_vec))
> > + return;
> > +
> > + raw_spin_lock_irqsave(&old_lpriv->ids_lock, flags);
> > + raw_spin_lock_irqsave(&new_lpriv->ids_lock, flags1);
>
> Lockdep should yell at you for this, rightfully so. And not only because
> of the missing nested() annotation.
>
> Assume there are two CPUs setting affinity for two different interrupts.
>
> CPU0 moves an interrupt to CPU1 and CPU1 moves another interrupt to
> CPU0. The resulting lock order is:
>
> CPU0 CPU1
> lock(lpriv[CPU0]); lock(lpriv[CPU1]);
> lock(lpriv[CPU1]); lock(lpriv[CPU0]);
>
> a classic ABBA deadlock.
>
> You need to take those locks always in the same order. Look at
> double_raw_lock() in kernel/sched/sched.h.
I have simplified the locking to avoid this nested locks so this
will be much simpler without any lock nesting.
>
> > + /* Unmask the new vector entry */
> > + if (test_bit(old_vec->local_id, old_lpriv->ids_enabled_bitmap))
> > + bitmap_set(new_lpriv->ids_enabled_bitmap,
> > + new_vec->local_id, 1);
>
> Either make that one line or please add brackets. See:
>
> https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#bracket-rules
Okay, I will update.
>
> > +static int __init imsic_local_init(void)
> > +{
> > + struct imsic_global_config *global = &imsic->global;
> > + struct imsic_local_priv *lpriv;
> > + struct imsic_vector *vec;
> > + int cpu, i;
> > +
> > + /* Allocate per-CPU private state */
> > + imsic->lpriv = alloc_percpu(typeof(*(imsic->lpriv)));
> > + if (!imsic->lpriv)
> > + return -ENOMEM;
> > +
> > + /* Setup per-CPU private state */
> > + for_each_possible_cpu(cpu) {
> > + lpriv = per_cpu_ptr(imsic->lpriv, cpu);
> > +
> > + raw_spin_lock_init(&lpriv->ids_lock);
> > +
> > + /* Allocate enabled bitmap */
> > + lpriv->ids_enabled_bitmap = bitmap_zalloc(global->nr_ids + 1,
> > + GFP_KERNEL);
> > + if (!lpriv->ids_enabled_bitmap) {
> > + imsic_local_cleanup();
> > + return -ENOMEM;
> > + }
> > +
> > + /* Allocate move array */
> > + lpriv->ids_move = kcalloc(global->nr_ids + 1,
> > + sizeof(*lpriv->ids_move), GFP_KERNEL);
> > + if (!lpriv->ids_move) {
> > + imsic_local_cleanup();
> > + return -ENOMEM;
> > + }
> > +
> > + /* Allocate vector array */
> > + lpriv->vectors = kcalloc(global->nr_ids + 1,
> > + sizeof(*lpriv->vectors), GFP_KERNEL);
> > + if (!lpriv->vectors) {
> > + imsic_local_cleanup();
> > + return -ENOMEM;
>
> Third instance of the same pattern. goto cleanup; perhaps?
Okay, I will add goto here.
>
> > +struct imsic_vector *imsic_vector_alloc(unsigned int hwirq,
> > + const struct cpumask *mask)
> > +{
> > + struct imsic_vector *vec = NULL;
> > + struct imsic_local_priv *lpriv;
> > + unsigned long flags;
> > + unsigned int cpu;
> > + int local_id;
> > +
> > + raw_spin_lock_irqsave(&imsic->matrix_lock, flags);
> > + local_id = irq_matrix_alloc(imsic->matrix, mask, false, &cpu);
> > + raw_spin_unlock_irqrestore(&imsic->matrix_lock, flags);
> > + if (local_id < 0)
> > + return NULL;
> > +
> > + lpriv = per_cpu_ptr(imsic->lpriv, cpu);
> > + vec = &lpriv->vectors[local_id];
> > + vec->hwirq = hwirq;
> > +
> > + return vec;
> > +}
>
> ...
>
> > +int imsic_hwirq_alloc(void)
> > +{
> > + int ret;
> > + unsigned long flags;
> > +
> > + raw_spin_lock_irqsave(&imsic->hwirqs_lock, flags);
> > + ret = bitmap_find_free_region(imsic->hwirqs_used_bitmap,
> > + imsic->nr_hwirqs, 0);
> > + raw_spin_unlock_irqrestore(&imsic->hwirqs_lock, flags);
> > +
> > + return ret;
> > +}
>
> This part is just to create a unique hwirq number, right?
Yes, this is only for unique hwirq. We can directly use virq
instead of hwirq so this hwirq allocation/management will
go away in the next revision.
>
> > +
> > + /* Find number of guest index bits in MSI address */
> > + rc = of_property_read_u32(to_of_node(fwnode),
> > + "riscv,guest-index-bits",
> > + &global->guest_index_bits);
> > + if (rc)
> > + global->guest_index_bits = 0;
>
> So here you get the index bits, but then 50 lines further down you do
> sanity checking. Wouldn't it make sense to do that right here?
>
> Same for the other bits.
This is intentional because we already have a AIA ACPI series
where this helps to reduce the number of "if (acpi_disabled)"
checks.
>
> > +
> > +/*
> > + * The IMSIC driver uses 1 IPI for ID synchronization and
> > + * arch/riscv/kernel/smp.c require 6 IPIs so we fix the
> > + * total number of IPIs to 8.
> > + */
> > +#define IMSIC_IPI_ID 1
> > +#define IMSIC_NR_IPI 8
> > +
> > +struct imsic_vector {
> > + /* Fixed details of the vector */
> > + unsigned int cpu;
> > + unsigned int local_id;
> > + /* Details saved by driver in the vector */
> > + unsigned int hwirq;
> > +};
> > +
> > +struct imsic_local_priv {
> > + /* Local state of interrupt identities */
> > + raw_spinlock_t ids_lock;
> > + unsigned long *ids_enabled_bitmap;
> > + struct imsic_vector **ids_move;
> > +
> > + /* Local vector table */
> > + struct imsic_vector *vectors;
>
> Please make those structs tabular:
>
> https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#struct-declarations-and-initializers
Okay, I will update.
>
> > +void __imsic_eix_update(unsigned long base_id,
> > + unsigned long num_id, bool pend, bool val);
> > +
> > +#define __imsic_id_set_enable(__id) \
> > + __imsic_eix_update((__id), 1, false, true)
> > +#define __imsic_id_clear_enable(__id) \
> > + __imsic_eix_update((__id), 1, false, false)
>
> inlines please.
Okay, I will update.
Regards,
Anup
On Sat, Feb 17, 2024 at 1:42 AM Thomas Gleixner <[email protected]> wrote:
>
> On Sat, Jan 27 2024 at 21:47, Anup Patel wrote:
> > +static int imsic_cpu_page_phys(unsigned int cpu,
> > + unsigned int guest_index,
> > + phys_addr_t *out_msi_pa)
> > +{
> > + struct imsic_global_config *global;
> > + struct imsic_local_config *local;
> > +
> > + global = &imsic->global;
> > + local = per_cpu_ptr(global->local, cpu);
> > +
> > + if (BIT(global->guest_index_bits) <= guest_index)
> > + return -EINVAL;
>
> As the callsite does not care about the return value, just make this
> function boolean and return true on success.
Okay, I will update.
>
> > + if (out_msi_pa)
> > + *out_msi_pa = local->msi_pa +
> > + (guest_index * IMSIC_MMIO_PAGE_SZ);
> > +
> > + return 0;
> > +}
> > +
> > +static void imsic_irq_mask(struct irq_data *d)
> > +{
> > + imsic_vector_mask(irq_data_get_irq_chip_data(d));
> > +}
> > +
> > +static void imsic_irq_unmask(struct irq_data *d)
> > +{
> > + imsic_vector_unmask(irq_data_get_irq_chip_data(d));
> > +}
> > +
> > +static int imsic_irq_retrigger(struct irq_data *d)
> > +{
> > + struct imsic_vector *vec = irq_data_get_irq_chip_data(d);
> > + struct imsic_local_config *local;
> > +
> > + if (WARN_ON(vec == NULL))
> > + return -ENOENT;
> > +
> > + local = per_cpu_ptr(imsic->global.local, vec->cpu);
> > + writel(vec->local_id, local->msi_va);
> > + return 0;
> > +}
> > +
> > +static void imsic_irq_compose_vector_msg(struct imsic_vector *vec,
> > + struct msi_msg *msg)
> > +{
> > + phys_addr_t msi_addr;
> > + int err;
> > +
> > + if (WARN_ON(vec == NULL))
> > + return;
> > +
> > + err = imsic_cpu_page_phys(vec->cpu, 0, &msi_addr);
> > + if (WARN_ON(err))
> > + return;
>
> if (WARN_ON(!imsic_cpu_page_phys(...)))
> return
> Hmm?
Okay, I will update like you suggested.
>
> > +
> > + msg->address_hi = upper_32_bits(msi_addr);
> > + msg->address_lo = lower_32_bits(msi_addr);
> > + msg->data = vec->local_id;
> > +}
> > +
> > +static void imsic_irq_compose_msg(struct irq_data *d, struct msi_msg *msg)
> > +{
> > + imsic_irq_compose_vector_msg(irq_data_get_irq_chip_data(d), msg);
> > +}
> > +
> > +#ifdef CONFIG_SMP
> > +static void imsic_msi_update_msg(struct irq_data *d, struct imsic_vector *vec)
> > +{
> > + struct msi_msg msg[2] = { [1] = { }, };
> > +
> > + imsic_irq_compose_vector_msg(vec, msg);
> > + irq_data_get_irq_chip(d)->irq_write_msi_msg(d, msg);
> > +}
> > +
> > +static int imsic_irq_set_affinity(struct irq_data *d,
> > + const struct cpumask *mask_val,
> > + bool force)
> > +{
> > + struct imsic_vector *old_vec, *new_vec;
> > + struct irq_data *pd = d->parent_data;
> > +
> > + old_vec = irq_data_get_irq_chip_data(pd);
> > + if (WARN_ON(old_vec == NULL))
> > + return -ENOENT;
> > +
> > + /* Get a new vector on the desired set of CPUs */
> > + new_vec = imsic_vector_alloc(old_vec->hwirq, mask_val);
> > + if (!new_vec)
> > + return -ENOSPC;
> > +
> > + /* If old vector belongs to the desired CPU then do nothing */
> > + if (old_vec->cpu == new_vec->cpu) {
> > + imsic_vector_free(new_vec);
> > + return IRQ_SET_MASK_OK_DONE;
> > + }
>
> You can spare that exercise by checking it before the allocation:
>
> if (cpumask_test_cpu(old_vec->cpu, mask_val))
> return IRQ_SET_MASK_OK_DONE;
Okay, I will update.
>
> > +
> > + /* Point device to the new vector */
> > + imsic_msi_update_msg(d, new_vec);
>
> > +static int imsic_irq_domain_alloc(struct irq_domain *domain,
> > + unsigned int virq, unsigned int nr_irqs,
> > + void *args)
> > +{
> > + struct imsic_vector *vec;
> > + int hwirq;
> > +
> > + /* Legacy-MSI or multi-MSI not supported yet. */
>
> What's legacy MSI in that context?
The legacy-MSI is the MSI support in PCI v2.2 where
number of MSIs allocated by device were either 1, 2, 4,
8, 16, or 32 and the data written is <data_word> + <irqnum>.
>
> > + if (nr_irqs > 1)
> > + return -ENOTSUPP;
> > +
> > + hwirq = imsic_hwirq_alloc();
> > + if (hwirq < 0)
> > + return hwirq;
> > +
> > + vec = imsic_vector_alloc(hwirq, cpu_online_mask);
> > + if (!vec) {
> > + imsic_hwirq_free(hwirq);
> > + return -ENOSPC;
> > + }
> > +
> > + irq_domain_set_info(domain, virq, hwirq,
> > + &imsic_irq_base_chip, vec,
> > + handle_simple_irq, NULL, NULL);
> > + irq_set_noprobe(virq);
> > + irq_set_affinity(virq, cpu_online_mask);
> > +
> > + /*
> > + * IMSIC does not implement irq_disable() so Linux interrupt
> > + * subsystem will take a lazy approach for disabling an IMSIC
> > + * interrupt. This means IMSIC interrupts are left unmasked
> > + * upon system suspend and interrupts are not processed
> > + * immediately upon system wake up. To tackle this, we disable
> > + * the lazy approach for all IMSIC interrupts.
>
> Why? Lazy works perfectly fine even w/o an irq_disable() callback.
This was suggested by SiFive folks. I am also not sure why we
need this. For now, I will drop this and bring it back as a separate
patch if required.
>
> > + */
> > + irq_set_status_flags(virq, IRQ_DISABLE_UNLAZY);
>
> > +
> > +#define MATCH_PLATFORM_MSI BIT(DOMAIN_BUS_PLATFORM_MSI)
>
> You really love macro indirections :)
This is to be consistent with MATCH_PCI_MSI introduced by the
subsequent patch.
Also, this is inspired from your ARM GIC patches.
https://lore.kernel.org/linux-arm-kernel/[email protected]/
https://lore.kernel.org/linux-arm-kernel/[email protected]/
https://lore.kernel.org/linux-arm-kernel/[email protected]/
https://lore.kernel.org/linux-arm-kernel/[email protected]/
>
> > +static const struct msi_parent_ops imsic_msi_parent_ops = {
> > + .supported_flags = MSI_GENERIC_FLAGS_MASK,
> > + .required_flags = MSI_FLAG_USE_DEF_DOM_OPS |
> > + MSI_FLAG_USE_DEF_CHIP_OPS,
> > + .bus_select_token = DOMAIN_BUS_NEXUS,
> > + .bus_select_mask = MATCH_PLATFORM_MSI,
> > + .init_dev_msi_info = imsic_init_dev_msi_info,
> > +};
> > +
> > +int imsic_irqdomain_init(void)
> > +{
> > + struct imsic_global_config *global;
> > +
> > + if (!imsic || !imsic->fwnode) {
> > + pr_err("early driver not probed\n");
> > + return -ENODEV;
> > + }
> > +
> > + if (imsic->base_domain) {
> > + pr_err("%pfwP: irq domain already created\n", imsic->fwnode);
> > + return -ENODEV;
> > + }
> > +
> > + global = &imsic->global;
>
> Please move that assignment down to the usage site. Here it's just a
> distraction.
Okay, I will update.
>
> > + /* Create Base IRQ domain */
> > + imsic->base_domain = irq_domain_create_tree(imsic->fwnode,
> > + &imsic_base_domain_ops, imsic);
> > + if (!imsic->base_domain) {
> > + pr_err("%pfwP: failed to create IMSIC base domain\n",
> > + imsic->fwnode);
> > + return -ENOMEM;
> > + }
> > + imsic->base_domain->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
> > + imsic->base_domain->msi_parent_ops = &imsic_msi_parent_ops;
>
Regards,
Anup
On Sat, Feb 17, 2024 at 1:44 AM Thomas Gleixner <[email protected]> wrote:
>
> On Sat, Jan 27 2024 at 21:47, Anup Patel wrote:
> > +#ifdef CONFIG_RISCV_IMSIC_PCI
> > +
> > +static void imsic_pci_mask_irq(struct irq_data *d)
> > +{
> > + pci_msi_mask_irq(d);
> > + irq_chip_mask_parent(d);
> > +}
> > +
> > +static void imsic_pci_unmask_irq(struct irq_data *d)
> > +{
> > + pci_msi_unmask_irq(d);
> > + irq_chip_unmask_parent(d);
>
> That's asymmetric vs. mask().
Yes, this needs to be symmetric vs mask(). I will update.
Regards,
Anup
On Sat, Feb 17, 2024 at 2:20 AM Thomas Gleixner <[email protected]> wrote:
>
> On Sat, Jan 27 2024 at 21:47, Anup Patel wrote:
> > +static int aplic_direct_irqdomain_translate(struct irq_domain *d,
> > + struct irq_fwspec *fwspec,
> > + unsigned long *hwirq,
> > + unsigned int *type)
>
> Please align the arguments to the first argument of the first line and
> use the 100 characters, i.e.
>
> static int aplic_direct_irqdomain_translate(struct irq_domain *d, struct irq_fwspec *fwspec,
> unsigned long *hwirq, unsigned int *type)
> {
>
> All over the place.
Okay, I will update.
>
> > +{
> > + struct aplic_priv *priv = d->host_data;
> > +
> > + return aplic_irqdomain_translate(fwspec, priv->gsi_base,
> > + hwirq, type);
> > +}
> > +
> > +static int aplic_direct_irqdomain_alloc(struct irq_domain *domain,
> > + unsigned int virq, unsigned int nr_irqs,
> > + void *arg)
> > +{
> > + int i, ret;
> > + unsigned int type;
> > + irq_hw_number_t hwirq;
> > + struct irq_fwspec *fwspec = arg;
> > + struct aplic_priv *priv = domain->host_data;
> > + struct aplic_direct *direct =
> > + container_of(priv, struct aplic_direct, priv);
>
> Variable ordering. Please make this consistent according to documentation.
Okay, I will update.
>
> > + ret = aplic_irqdomain_translate(fwspec, priv->gsi_base,
> > + &hwirq, &type);
> > + if (ret)
> > + return ret;
> > +
> > + for (i = 0; i < nr_irqs; i++) {
> > + irq_domain_set_info(domain, virq + i, hwirq + i,
> > + &aplic_direct_chip, priv,
> > + handle_fasteoi_irq, NULL, NULL);
> > + irq_set_affinity(virq + i, &direct->lmask);
> > + /* See the reason described in aplic_msi_irqdomain_alloc() */
>
> I still have to understand that "reason". :)
Like mentioned on another patch, I will drop it for now. If required
then we can bring it back as a separate patch with clear reasoning.
>
> > + irq_set_status_flags(virq + i, IRQ_DISABLE_UNLAZY);
> > + }
>
> Thanks,
>
> tglx
Regards,
Anup
On Sat, Feb 17, 2024 at 2:34 AM Thomas Gleixner <[email protected]> wrote:
>
> On Sat, Jan 27 2024 at 21:47, Anup Patel wrote:
> > We extend the existing APLIC irqchip driver to support MSI-mode for
> > RISC-V platforms having both wired interrupts and MSIs.
>
> We? Just s/We//
Okay, I will update.
>
> > +
> > +static void aplic_msi_irq_unmask(struct irq_data *d)
> > +{
> > + aplic_irq_unmask(d);
> > + irq_chip_unmask_parent(d);
> > +}
> > +
> > +static void aplic_msi_irq_mask(struct irq_data *d)
> > +{
> > + aplic_irq_mask(d);
> > + irq_chip_mask_parent(d);
> > +}
>
> Again asymmetric vs. unmask()
Okay, I will update.
>
> > +static void aplic_msi_irq_eoi(struct irq_data *d)
> > +{
> > + struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
> > + u32 reg_off, reg_mask;
> > +
> > + /*
> > + * EOI handling only required only for level-triggered
> > + * interrupts in APLIC MSI mode.
> > + */
> > +
> > + reg_off = APLIC_CLRIP_BASE + ((d->hwirq / APLIC_IRQBITS_PER_REG) * 4);
> > + reg_mask = BIT(d->hwirq % APLIC_IRQBITS_PER_REG);
> > + switch (irqd_get_trigger_type(d)) {
> > + case IRQ_TYPE_LEVEL_LOW:
> > + if (!(readl(priv->regs + reg_off) & reg_mask))
> > + writel(d->hwirq, priv->regs + APLIC_SETIPNUM_LE);
>
> A comment what this condition is for would be nice.
Okay, I will add a comment about the condition.
Regards,
Anup
On Thu, Feb 1, 2024 at 12:09 PM Andy Chiu <[email protected]> wrote:
>
> Hi Anup,
>
> On Sun, Jan 28, 2024 at 12:24 AM Anup Patel <[email protected]> wrote:
> >
> > The RISC-V advanced interrupt architecture (AIA) specification defines
> > advanced platform-level interrupt controller (APLIC) which has two modes
> > of operation: 1) Direct mode and 2) MSI mode.
> > (For more details, refer https://github.com/riscv/riscv-aia)
> >
> > In APLIC direct-mode, wired interrupts are forwared to CPUs (or HARTs)
> > as a local external interrupt.
> >
> > We add a platform irqchip driver for the RISC-V APLIC direct-mode to
> > support RISC-V platforms having only wired interrupts.
> >
> > Signed-off-by: Anup Patel <[email protected]>
> > ---
> > drivers/irqchip/Kconfig | 5 +
> > drivers/irqchip/Makefile | 1 +
> > drivers/irqchip/irq-riscv-aplic-direct.c | 343 +++++++++++++++++++++++
> > drivers/irqchip/irq-riscv-aplic-main.c | 232 +++++++++++++++
> > drivers/irqchip/irq-riscv-aplic-main.h | 45 +++
> > include/linux/irqchip/riscv-aplic.h | 119 ++++++++
> > 6 files changed, 745 insertions(+)
> > create mode 100644 drivers/irqchip/irq-riscv-aplic-direct.c
> > create mode 100644 drivers/irqchip/irq-riscv-aplic-main.c
> > create mode 100644 drivers/irqchip/irq-riscv-aplic-main.h
> > create mode 100644 include/linux/irqchip/riscv-aplic.h
> >
> > diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
> > index 2fc0cb32341a..dbc8811d3764 100644
> > --- a/drivers/irqchip/Kconfig
> > +++ b/drivers/irqchip/Kconfig
> > @@ -546,6 +546,11 @@ config SIFIVE_PLIC
> > select IRQ_DOMAIN_HIERARCHY
> > select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP
> >
> > +config RISCV_APLIC
> > + bool
> > + depends on RISCV
> > + select IRQ_DOMAIN_HIERARCHY
> > +
> > config RISCV_IMSIC
> > bool
> > depends on RISCV
> > diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
> > index abca445a3229..7f8289790ed8 100644
> > --- a/drivers/irqchip/Makefile
> > +++ b/drivers/irqchip/Makefile
> > @@ -95,6 +95,7 @@ obj-$(CONFIG_QCOM_MPM) += irq-qcom-mpm.o
> > obj-$(CONFIG_CSKY_MPINTC) += irq-csky-mpintc.o
> > obj-$(CONFIG_CSKY_APB_INTC) += irq-csky-apb-intc.o
> > obj-$(CONFIG_RISCV_INTC) += irq-riscv-intc.o
> > +obj-$(CONFIG_RISCV_APLIC) += irq-riscv-aplic-main.o irq-riscv-aplic-direct.o
> > obj-$(CONFIG_RISCV_IMSIC) += irq-riscv-imsic-state.o irq-riscv-imsic-early.o irq-riscv-imsic-platform.o
> > obj-$(CONFIG_SIFIVE_PLIC) += irq-sifive-plic.o
> > obj-$(CONFIG_IMX_IRQSTEER) += irq-imx-irqsteer.o
> > diff --git a/drivers/irqchip/irq-riscv-aplic-direct.c b/drivers/irqchip/irq-riscv-aplic-direct.c
> > new file mode 100644
> > index 000000000000..9ed2666bfb5e
> > --- /dev/null
> > +++ b/drivers/irqchip/irq-riscv-aplic-direct.c
> > @@ -0,0 +1,343 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2021 Western Digital Corporation or its affiliates.
> > + * Copyright (C) 2022 Ventana Micro Systems Inc.
> > + */
> > +
> > +#include <linux/bitops.h>
> > +#include <linux/cpu.h>
> > +#include <linux/interrupt.h>
> > +#include <linux/irqchip.h>
> > +#include <linux/irqchip/chained_irq.h>
> > +#include <linux/irqchip/riscv-aplic.h>
> > +#include <linux/module.h>
> > +#include <linux/of_address.h>
> > +#include <linux/printk.h>
> > +#include <linux/smp.h>
> > +
> > +#include "irq-riscv-aplic-main.h"
> > +
> > +#define APLIC_DISABLE_IDELIVERY 0
> > +#define APLIC_ENABLE_IDELIVERY 1
> > +#define APLIC_DISABLE_ITHRESHOLD 1
> > +#define APLIC_ENABLE_ITHRESHOLD 0
> > +
> > +struct aplic_direct {
> > + struct aplic_priv priv;
> > + struct irq_domain *irqdomain;
> > + struct cpumask lmask;
> > +};
> > +
> > +struct aplic_idc {
> > + unsigned int hart_index;
> > + void __iomem *regs;
> > + struct aplic_direct *direct;
> > +};
> > +
> > +static unsigned int aplic_direct_parent_irq;
> > +static DEFINE_PER_CPU(struct aplic_idc, aplic_idcs);
> > +
> > +static void aplic_direct_irq_eoi(struct irq_data *d)
> > +{
> > + /*
> > + * The fasteoi_handler requires irq_eoi() callback hence
> > + * provide a dummy handler.
> > + */
> > +}
> > +
> > +#ifdef CONFIG_SMP
> > +static int aplic_direct_set_affinity(struct irq_data *d,
> > + const struct cpumask *mask_val, bool force)
> > +{
> > + struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
> > + struct aplic_direct *direct =
> > + container_of(priv, struct aplic_direct, priv);
> > + struct aplic_idc *idc;
> > + unsigned int cpu, val;
> > + struct cpumask amask;
> > + void __iomem *target;
> > +
> > + cpumask_and(&amask, &direct->lmask, mask_val);
> > +
> > + if (force)
> > + cpu = cpumask_first(&amask);
> > + else
> > + cpu = cpumask_any_and(&amask, cpu_online_mask);
> > +
> > + if (cpu >= nr_cpu_ids)
> > + return -EINVAL;
> > +
> > + idc = per_cpu_ptr(&aplic_idcs, cpu);
> > + target = priv->regs + APLIC_TARGET_BASE;
> > + target += (d->hwirq - 1) * sizeof(u32);
> > + val = idc->hart_index & APLIC_TARGET_HART_IDX_MASK;
> > + val <<= APLIC_TARGET_HART_IDX_SHIFT;
> > + val |= APLIC_DEFAULT_PRIORITY;
> > + writel(val, target);
> > +
> > + irq_data_update_effective_affinity(d, cpumask_of(cpu));
> > +
> > + return IRQ_SET_MASK_OK_DONE;
> > +}
> > +#endif
> > +
> > +static struct irq_chip aplic_direct_chip = {
> > + .name = "APLIC-DIRECT",
> > + .irq_mask = aplic_irq_mask,
> > + .irq_unmask = aplic_irq_unmask,
> > + .irq_set_type = aplic_irq_set_type,
> > + .irq_eoi = aplic_direct_irq_eoi,
> > +#ifdef CONFIG_SMP
> > + .irq_set_affinity = aplic_direct_set_affinity,
> > +#endif
> > + .flags = IRQCHIP_SET_TYPE_MASKED |
> > + IRQCHIP_SKIP_SET_WAKE |
> > + IRQCHIP_MASK_ON_SUSPEND,
> > +};
> > +
> > +static int aplic_direct_irqdomain_translate(struct irq_domain *d,
> > + struct irq_fwspec *fwspec,
> > + unsigned long *hwirq,
> > + unsigned int *type)
> > +{
> > + struct aplic_priv *priv = d->host_data;
> > +
> > + return aplic_irqdomain_translate(fwspec, priv->gsi_base,
> > + hwirq, type);
> > +}
> > +
> > +static int aplic_direct_irqdomain_alloc(struct irq_domain *domain,
> > + unsigned int virq, unsigned int nr_irqs,
> > + void *arg)
> > +{
> > + int i, ret;
> > + unsigned int type;
> > + irq_hw_number_t hwirq;
> > + struct irq_fwspec *fwspec = arg;
> > + struct aplic_priv *priv = domain->host_data;
> > + struct aplic_direct *direct =
> > + container_of(priv, struct aplic_direct, priv);
> > +
> > + ret = aplic_irqdomain_translate(fwspec, priv->gsi_base,
> > + &hwirq, &type);
> > + if (ret)
> > + return ret;
> > +
> > + for (i = 0; i < nr_irqs; i++) {
> > + irq_domain_set_info(domain, virq + i, hwirq + i,
> > + &aplic_direct_chip, priv,
> > + handle_fasteoi_irq, NULL, NULL);
> > + irq_set_affinity(virq + i, &direct->lmask);
> > + /* See the reason described in aplic_msi_irqdomain_alloc() */
> > + irq_set_status_flags(virq + i, IRQ_DISABLE_UNLAZY);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static const struct irq_domain_ops aplic_direct_irqdomain_ops = {
> > + .translate = aplic_direct_irqdomain_translate,
> > + .alloc = aplic_direct_irqdomain_alloc,
> > + .free = irq_domain_free_irqs_top,
> > +};
> > +
> > +/*
> > + * To handle an APLIC direct interrupts, we just read the CLAIMI register
> > + * which will return highest priority pending interrupt and clear the
> > + * pending bit of the interrupt. This process is repeated until CLAIMI
> > + * register return zero value.
> > + */
> > +static void aplic_direct_handle_irq(struct irq_desc *desc)
> > +{
> > + struct aplic_idc *idc = this_cpu_ptr(&aplic_idcs);
> > + struct irq_chip *chip = irq_desc_get_chip(desc);
> > + struct irq_domain *irqdomain = idc->direct->irqdomain;
> > + irq_hw_number_t hw_irq;
> > + int irq;
> > +
> > + chained_irq_enter(chip, desc);
> > +
> > + while ((hw_irq = readl(idc->regs + APLIC_IDC_CLAIMI))) {
> > + hw_irq = hw_irq >> APLIC_IDC_TOPI_ID_SHIFT;
> > + irq = irq_find_mapping(irqdomain, hw_irq);
> > +
> > + if (unlikely(irq <= 0))
> > + dev_warn_ratelimited(idc->direct->priv.dev,
> > + "hw_irq %lu mapping not found\n",
> > + hw_irq);
> > + else
> > + generic_handle_irq(irq);
> > + }
> > +
> > + chained_irq_exit(chip, desc);
> > +}
> > +
> > +static void aplic_idc_set_delivery(struct aplic_idc *idc, bool en)
> > +{
> > + u32 de = (en) ? APLIC_ENABLE_IDELIVERY : APLIC_DISABLE_IDELIVERY;
> > + u32 th = (en) ? APLIC_ENABLE_ITHRESHOLD : APLIC_DISABLE_ITHRESHOLD;
> > +
> > + /* Priority must be less than threshold for interrupt triggering */
> > + writel(th, idc->regs + APLIC_IDC_ITHRESHOLD);
> > +
> > + /* Delivery must be set to 1 for interrupt triggering */
> > + writel(de, idc->regs + APLIC_IDC_IDELIVERY);
> > +}
> > +
> > +static int aplic_direct_dying_cpu(unsigned int cpu)
> > +{
> > + if (aplic_direct_parent_irq)
> > + disable_percpu_irq(aplic_direct_parent_irq);
> > +
> > + return 0;
> > +}
> > +
> > +static int aplic_direct_starting_cpu(unsigned int cpu)
> > +{
> > + if (aplic_direct_parent_irq)
> > + enable_percpu_irq(aplic_direct_parent_irq,
> > + irq_get_trigger_type(aplic_direct_parent_irq));
> > +
> > + return 0;
> > +}
> > +
> > +static int aplic_direct_parse_parent_hwirq(struct device *dev,
> > + u32 index, u32 *parent_hwirq,
> > + unsigned long *parent_hartid)
> > +{
> > + struct of_phandle_args parent;
> > + int rc;
> > +
> > + /*
> > + * Currently, only OF fwnode is supported so extend this
> > + * function for ACPI support.
> > + */
> > + if (!is_of_node(dev->fwnode))
> > + return -EINVAL;
> > +
> > + rc = of_irq_parse_one(to_of_node(dev->fwnode), index, &parent);
> > + if (rc)
> > + return rc;
> > +
> > + rc = riscv_of_parent_hartid(parent.np, parent_hartid);
> > + if (rc)
> > + return rc;
> > +
> > + *parent_hwirq = parent.args[0];
> > + return 0;
> > +}
> > +
> > +int aplic_direct_setup(struct device *dev, void __iomem *regs)
> > +{
> > + int i, j, rc, cpu, setup_count = 0;
> > + struct aplic_direct *direct;
> > + struct aplic_priv *priv;
> > + struct irq_domain *domain;
> > + unsigned long hartid;
> > + struct aplic_idc *idc;
> > + u32 val, hwirq;
> > +
> > + direct = kzalloc(sizeof(*direct), GFP_KERNEL);
> > + if (!direct)
> > + return -ENOMEM;
> > + priv = &direct->priv;
> > +
> > + rc = aplic_setup_priv(priv, dev, regs);
> > + if (rc) {
> > + dev_err(dev, "failed to create APLIC context\n");
> > + kfree(direct);
> > + return rc;
> > + }
> > +
> > + /* Setup per-CPU IDC and target CPU mask */
> > + for (i = 0; i < priv->nr_idcs; i++) {
> > + rc = aplic_direct_parse_parent_hwirq(dev, i, &hwirq, &hartid);
> > + if (rc) {
> > + dev_warn(dev, "parent irq for IDC%d not found\n", i);
> > + continue;
> > + }
> > +
> > + /*
> > + * Skip interrupts other than external interrupts for
> > + * current privilege level.
> > + */
> > + if (hwirq != RV_IRQ_EXT)
> > + continue;
> > +
> > + cpu = riscv_hartid_to_cpuid(hartid);
> > + if (cpu < 0) {
> > + dev_warn(dev, "invalid cpuid for IDC%d\n", i);
> > + continue;
> > + }
> > +
> > + cpumask_set_cpu(cpu, &direct->lmask);
> > +
> > + idc = per_cpu_ptr(&aplic_idcs, cpu);
> > + idc->hart_index = i;
> > + idc->regs = priv->regs + APLIC_IDC_BASE + i * APLIC_IDC_SIZE;
> > + idc->direct = direct;
> > +
> > + aplic_idc_set_delivery(idc, true);
> > +
> > + /*
> > + * Boot cpu might not have APLIC hart_index = 0 so check
> > + * and update target registers of all interrupts.
> > + */
>
> IIUC, the use of smp_processor_id() has to be protected by turning off
> preemption. So maybe please consider adding:
>
> + preempt_disable();
>
> > + if (cpu == smp_processor_id() && idc->hart_index) {
> > + val = idc->hart_index & APLIC_TARGET_HART_IDX_MASK;
> > + val <<= APLIC_TARGET_HART_IDX_SHIFT;
> > + val |= APLIC_DEFAULT_PRIORITY;
> > + for (j = 1; j <= priv->nr_irqs; j++)
> > + writel(val, priv->regs + APLIC_TARGET_BASE +
> > + (j - 1) * sizeof(u32));
> > + }
>
> , and here:
> + preempt_enable();
>
> Or use get_cpu()/put_cpu() variant to guard the use of processor id.
The get_cpu()/put_cpu() is better so I'll use that in the next revision.
Regards,
Anup
>
> > +
> > + setup_count++;
> > + }
> > +
> > + /* Find parent domain and register chained handler */
> > + domain = irq_find_matching_fwnode(riscv_get_intc_hwnode(),
> > + DOMAIN_BUS_ANY);
> > + if (!aplic_direct_parent_irq && domain) {
> > + aplic_direct_parent_irq = irq_create_mapping(domain, RV_IRQ_EXT);
> > + if (aplic_direct_parent_irq) {
> > + irq_set_chained_handler(aplic_direct_parent_irq,
> > + aplic_direct_handle_irq);
> > +
> > + /*
> > + * Setup CPUHP notifier to enable parent
> > + * interrupt on all CPUs
> > + */
> > + cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> > + "irqchip/riscv/aplic:starting",
> > + aplic_direct_starting_cpu,
> > + aplic_direct_dying_cpu);
> > + }
> > + }
> > +
> > + /* Fail if we were not able to setup IDC for any CPU */
> > + if (!setup_count) {
> > + kfree(direct);
> > + return -ENODEV;
> > + }
> > +
> > + /* Setup global config and interrupt delivery */
> > + aplic_init_hw_global(priv, false);
> > +
> > + /* Create irq domain instance for the APLIC */
> > + direct->irqdomain = irq_domain_create_linear(dev->fwnode,
> > + priv->nr_irqs + 1,
> > + &aplic_direct_irqdomain_ops,
> > + priv);
> > + if (!direct->irqdomain) {
> > + dev_err(dev, "failed to create direct irq domain\n");
> > + kfree(direct);
> > + return -ENOMEM;
> > + }
> > +
> > + /* Advertise the interrupt controller */
> > + dev_info(dev, "%d interrupts directly connected to %d CPUs\n",
> > + priv->nr_irqs, priv->nr_idcs);
> > +
> > + return 0;
> > +}
> > diff --git a/drivers/irqchip/irq-riscv-aplic-main.c b/drivers/irqchip/irq-riscv-aplic-main.c
> > new file mode 100644
> > index 000000000000..87450708a733
> > --- /dev/null
> > +++ b/drivers/irqchip/irq-riscv-aplic-main.c
> > @@ -0,0 +1,232 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2021 Western Digital Corporation or its affiliates.
> > + * Copyright (C) 2022 Ventana Micro Systems Inc.
> > + */
> > +
> > +#include <linux/of.h>
> > +#include <linux/of_irq.h>
> > +#include <linux/printk.h>
> > +#include <linux/module.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/irqchip/riscv-aplic.h>
> > +
> > +#include "irq-riscv-aplic-main.h"
> > +
> > +void aplic_irq_unmask(struct irq_data *d)
> > +{
> > + struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
> > +
> > + writel(d->hwirq, priv->regs + APLIC_SETIENUM);
> > +}
> > +
> > +void aplic_irq_mask(struct irq_data *d)
> > +{
> > + struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
> > +
> > + writel(d->hwirq, priv->regs + APLIC_CLRIENUM);
> > +}
> > +
> > +int aplic_irq_set_type(struct irq_data *d, unsigned int type)
> > +{
> > + u32 val = 0;
> > + void __iomem *sourcecfg;
> > + struct aplic_priv *priv = irq_data_get_irq_chip_data(d);
> > +
> > + switch (type) {
> > + case IRQ_TYPE_NONE:
> > + val = APLIC_SOURCECFG_SM_INACTIVE;
> > + break;
> > + case IRQ_TYPE_LEVEL_LOW:
> > + val = APLIC_SOURCECFG_SM_LEVEL_LOW;
> > + break;
> > + case IRQ_TYPE_LEVEL_HIGH:
> > + val = APLIC_SOURCECFG_SM_LEVEL_HIGH;
> > + break;
> > + case IRQ_TYPE_EDGE_FALLING:
> > + val = APLIC_SOURCECFG_SM_EDGE_FALL;
> > + break;
> > + case IRQ_TYPE_EDGE_RISING:
> > + val = APLIC_SOURCECFG_SM_EDGE_RISE;
> > + break;
> > + default:
> > + return -EINVAL;
> > + }
> > +
> > + sourcecfg = priv->regs + APLIC_SOURCECFG_BASE;
> > + sourcecfg += (d->hwirq - 1) * sizeof(u32);
> > + writel(val, sourcecfg);
> > +
> > + return 0;
> > +}
> > +
> > +int aplic_irqdomain_translate(struct irq_fwspec *fwspec, u32 gsi_base,
> > + unsigned long *hwirq, unsigned int *type)
> > +{
> > + if (WARN_ON(fwspec->param_count < 2))
> > + return -EINVAL;
> > + if (WARN_ON(!fwspec->param[0]))
> > + return -EINVAL;
> > +
> > + /* For DT, gsi_base is always zero. */
> > + *hwirq = fwspec->param[0] - gsi_base;
> > + *type = fwspec->param[1] & IRQ_TYPE_SENSE_MASK;
> > +
> > + WARN_ON(*type == IRQ_TYPE_NONE);
> > +
> > + return 0;
> > +}
> > +
> > +void aplic_init_hw_global(struct aplic_priv *priv, bool msi_mode)
> > +{
> > + u32 val;
> > +#ifdef CONFIG_RISCV_M_MODE
> > + u32 valH;
> > +
> > + if (msi_mode) {
> > + val = priv->msicfg.base_ppn;
> > + valH = ((u64)priv->msicfg.base_ppn >> 32) &
> > + APLIC_xMSICFGADDRH_BAPPN_MASK;
> > + valH |= (priv->msicfg.lhxw & APLIC_xMSICFGADDRH_LHXW_MASK)
> > + << APLIC_xMSICFGADDRH_LHXW_SHIFT;
> > + valH |= (priv->msicfg.hhxw & APLIC_xMSICFGADDRH_HHXW_MASK)
> > + << APLIC_xMSICFGADDRH_HHXW_SHIFT;
> > + valH |= (priv->msicfg.lhxs & APLIC_xMSICFGADDRH_LHXS_MASK)
> > + << APLIC_xMSICFGADDRH_LHXS_SHIFT;
> > + valH |= (priv->msicfg.hhxs & APLIC_xMSICFGADDRH_HHXS_MASK)
> > + << APLIC_xMSICFGADDRH_HHXS_SHIFT;
> > + writel(val, priv->regs + APLIC_xMSICFGADDR);
> > + writel(valH, priv->regs + APLIC_xMSICFGADDRH);
> > + }
> > +#endif
> > +
> > + /* Setup APLIC domaincfg register */
> > + val = readl(priv->regs + APLIC_DOMAINCFG);
> > + val |= APLIC_DOMAINCFG_IE;
> > + if (msi_mode)
> > + val |= APLIC_DOMAINCFG_DM;
> > + writel(val, priv->regs + APLIC_DOMAINCFG);
> > + if (readl(priv->regs + APLIC_DOMAINCFG) != val)
> > + dev_warn(priv->dev, "unable to write 0x%x in domaincfg\n",
> > + val);
> > +}
> > +
> > +static void aplic_init_hw_irqs(struct aplic_priv *priv)
> > +{
> > + int i;
> > +
> > + /* Disable all interrupts */
> > + for (i = 0; i <= priv->nr_irqs; i += 32)
> > + writel(-1U, priv->regs + APLIC_CLRIE_BASE +
> > + (i / 32) * sizeof(u32));
> > +
> > + /* Set interrupt type and default priority for all interrupts */
> > + for (i = 1; i <= priv->nr_irqs; i++) {
> > + writel(0, priv->regs + APLIC_SOURCECFG_BASE +
> > + (i - 1) * sizeof(u32));
> > + writel(APLIC_DEFAULT_PRIORITY,
> > + priv->regs + APLIC_TARGET_BASE +
> > + (i - 1) * sizeof(u32));
> > + }
> > +
> > + /* Clear APLIC domaincfg */
> > + writel(0, priv->regs + APLIC_DOMAINCFG);
> > +}
> > +
> > +int aplic_setup_priv(struct aplic_priv *priv, struct device *dev,
> > + void __iomem *regs)
> > +{
> > + struct of_phandle_args parent;
> > + int rc;
> > +
> > + /*
> > + * Currently, only OF fwnode is supported so extend this
> > + * function for ACPI support.
> > + */
> > + if (!is_of_node(dev->fwnode))
> > + return -EINVAL;
> > +
> > + /* Save device pointer and register base */
> > + priv->dev = dev;
> > + priv->regs = regs;
> > +
> > + /* Find out number of interrupt sources */
> > + rc = of_property_read_u32(to_of_node(dev->fwnode),
> > + "riscv,num-sources",
> > + &priv->nr_irqs);
> > + if (rc) {
> > + dev_err(dev, "failed to get number of interrupt sources\n");
> > + return rc;
> > + }
> > +
> > + /*
> > + * Find out number of IDCs based on parent interrupts
> > + *
> > + * If "msi-parent" property is present then we ignore the
> > + * APLIC IDCs which forces the APLIC driver to use MSI mode.
> > + */
> > + if (!of_property_present(to_of_node(dev->fwnode), "msi-parent")) {
> > + while (!of_irq_parse_one(to_of_node(dev->fwnode),
> > + priv->nr_idcs, &parent))
> > + priv->nr_idcs++;
> > + }
> > +
> > + /* Setup initial state APLIC interrupts */
> > + aplic_init_hw_irqs(priv);
> > +
> > + return 0;
> > +}
> > +
> > +static int aplic_probe(struct platform_device *pdev)
> > +{
> > + struct device *dev = &pdev->dev;
> > + bool msi_mode = false;
> > + struct resource *res;
> > + void __iomem *regs;
> > + int rc;
> > +
> > + /* Map the MMIO registers */
> > + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > + if (!res) {
> > + dev_err(dev, "failed to get MMIO resource\n");
> > + return -EINVAL;
> > + }
> > + regs = devm_ioremap(&pdev->dev, res->start, resource_size(res));
> > + if (!regs) {
> > + dev_err(dev, "failed map MMIO registers\n");
> > + return -ENOMEM;
> > + }
> > +
> > + /*
> > + * If msi-parent property is present then setup APLIC MSI
> > + * mode otherwise setup APLIC direct mode.
> > + */
> > + if (is_of_node(dev->fwnode))
> > + msi_mode = of_property_present(to_of_node(dev->fwnode),
> > + "msi-parent");
> > + if (msi_mode)
> > + rc = -ENODEV;
> > + else
> > + rc = aplic_direct_setup(dev, regs);
> > + if (rc) {
> > + dev_err(dev, "failed setup APLIC in %s mode\n",
> > + msi_mode ? "MSI" : "direct");
> > + return rc;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static const struct of_device_id aplic_match[] = {
> > + { .compatible = "riscv,aplic" },
> > + {}
> > +};
> > +
> > +static struct platform_driver aplic_driver = {
> > + .driver = {
> > + .name = "riscv-aplic",
> > + .of_match_table = aplic_match,
> > + },
> > + .probe = aplic_probe,
> > +};
> > +builtin_platform_driver(aplic_driver);
> > diff --git a/drivers/irqchip/irq-riscv-aplic-main.h b/drivers/irqchip/irq-riscv-aplic-main.h
> > new file mode 100644
> > index 000000000000..474a04229334
> > --- /dev/null
> > +++ b/drivers/irqchip/irq-riscv-aplic-main.h
> > @@ -0,0 +1,45 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (C) 2021 Western Digital Corporation or its affiliates.
> > + * Copyright (C) 2022 Ventana Micro Systems Inc.
> > + */
> > +
> > +#ifndef _IRQ_RISCV_APLIC_MAIN_H
> > +#define _IRQ_RISCV_APLIC_MAIN_H
> > +
> > +#include <linux/device.h>
> > +#include <linux/io.h>
> > +#include <linux/irq.h>
> > +#include <linux/irqdomain.h>
> > +#include <linux/fwnode.h>
> > +
> > +#define APLIC_DEFAULT_PRIORITY 1
> > +
> > +struct aplic_msicfg {
> > + phys_addr_t base_ppn;
> > + u32 hhxs;
> > + u32 hhxw;
> > + u32 lhxs;
> > + u32 lhxw;
> > +};
> > +
> > +struct aplic_priv {
> > + struct device *dev;
> > + u32 gsi_base;
> > + u32 nr_irqs;
> > + u32 nr_idcs;
> > + void __iomem *regs;
> > + struct aplic_msicfg msicfg;
> > +};
> > +
> > +void aplic_irq_unmask(struct irq_data *d);
> > +void aplic_irq_mask(struct irq_data *d);
> > +int aplic_irq_set_type(struct irq_data *d, unsigned int type);
> > +int aplic_irqdomain_translate(struct irq_fwspec *fwspec, u32 gsi_base,
> > + unsigned long *hwirq, unsigned int *type);
> > +void aplic_init_hw_global(struct aplic_priv *priv, bool msi_mode);
> > +int aplic_setup_priv(struct aplic_priv *priv, struct device *dev,
> > + void __iomem *regs);
> > +int aplic_direct_setup(struct device *dev, void __iomem *regs);
> > +
> > +#endif
> > diff --git a/include/linux/irqchip/riscv-aplic.h b/include/linux/irqchip/riscv-aplic.h
> > new file mode 100644
> > index 000000000000..97e198ea0109
> > --- /dev/null
> > +++ b/include/linux/irqchip/riscv-aplic.h
> > @@ -0,0 +1,119 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (C) 2021 Western Digital Corporation or its affiliates.
> > + * Copyright (C) 2022 Ventana Micro Systems Inc.
> > + */
> > +#ifndef __LINUX_IRQCHIP_RISCV_APLIC_H
> > +#define __LINUX_IRQCHIP_RISCV_APLIC_H
> > +
> > +#include <linux/bitops.h>
> > +
> > +#define APLIC_MAX_IDC BIT(14)
> > +#define APLIC_MAX_SOURCE 1024
> > +
> > +#define APLIC_DOMAINCFG 0x0000
> > +#define APLIC_DOMAINCFG_RDONLY 0x80000000
> > +#define APLIC_DOMAINCFG_IE BIT(8)
> > +#define APLIC_DOMAINCFG_DM BIT(2)
> > +#define APLIC_DOMAINCFG_BE BIT(0)
> > +
> > +#define APLIC_SOURCECFG_BASE 0x0004
> > +#define APLIC_SOURCECFG_D BIT(10)
> > +#define APLIC_SOURCECFG_CHILDIDX_MASK 0x000003ff
> > +#define APLIC_SOURCECFG_SM_MASK 0x00000007
> > +#define APLIC_SOURCECFG_SM_INACTIVE 0x0
> > +#define APLIC_SOURCECFG_SM_DETACH 0x1
> > +#define APLIC_SOURCECFG_SM_EDGE_RISE 0x4
> > +#define APLIC_SOURCECFG_SM_EDGE_FALL 0x5
> > +#define APLIC_SOURCECFG_SM_LEVEL_HIGH 0x6
> > +#define APLIC_SOURCECFG_SM_LEVEL_LOW 0x7
> > +
> > +#define APLIC_MMSICFGADDR 0x1bc0
> > +#define APLIC_MMSICFGADDRH 0x1bc4
> > +#define APLIC_SMSICFGADDR 0x1bc8
> > +#define APLIC_SMSICFGADDRH 0x1bcc
> > +
> > +#ifdef CONFIG_RISCV_M_MODE
> > +#define APLIC_xMSICFGADDR APLIC_MMSICFGADDR
> > +#define APLIC_xMSICFGADDRH APLIC_MMSICFGADDRH
> > +#else
> > +#define APLIC_xMSICFGADDR APLIC_SMSICFGADDR
> > +#define APLIC_xMSICFGADDRH APLIC_SMSICFGADDRH
> > +#endif
> > +
> > +#define APLIC_xMSICFGADDRH_L BIT(31)
> > +#define APLIC_xMSICFGADDRH_HHXS_MASK 0x1f
> > +#define APLIC_xMSICFGADDRH_HHXS_SHIFT 24
> > +#define APLIC_xMSICFGADDRH_LHXS_MASK 0x7
> > +#define APLIC_xMSICFGADDRH_LHXS_SHIFT 20
> > +#define APLIC_xMSICFGADDRH_HHXW_MASK 0x7
> > +#define APLIC_xMSICFGADDRH_HHXW_SHIFT 16
> > +#define APLIC_xMSICFGADDRH_LHXW_MASK 0xf
> > +#define APLIC_xMSICFGADDRH_LHXW_SHIFT 12
> > +#define APLIC_xMSICFGADDRH_BAPPN_MASK 0xfff
> > +
> > +#define APLIC_xMSICFGADDR_PPN_SHIFT 12
> > +
> > +#define APLIC_xMSICFGADDR_PPN_HART(__lhxs) \
> > + (BIT(__lhxs) - 1)
> > +
> > +#define APLIC_xMSICFGADDR_PPN_LHX_MASK(__lhxw) \
> > + (BIT(__lhxw) - 1)
> > +#define APLIC_xMSICFGADDR_PPN_LHX_SHIFT(__lhxs) \
> > + ((__lhxs))
> > +#define APLIC_xMSICFGADDR_PPN_LHX(__lhxw, __lhxs) \
> > + (APLIC_xMSICFGADDR_PPN_LHX_MASK(__lhxw) << \
> > + APLIC_xMSICFGADDR_PPN_LHX_SHIFT(__lhxs))
> > +
> > +#define APLIC_xMSICFGADDR_PPN_HHX_MASK(__hhxw) \
> > + (BIT(__hhxw) - 1)
> > +#define APLIC_xMSICFGADDR_PPN_HHX_SHIFT(__hhxs) \
> > + ((__hhxs) + APLIC_xMSICFGADDR_PPN_SHIFT)
> > +#define APLIC_xMSICFGADDR_PPN_HHX(__hhxw, __hhxs) \
> > + (APLIC_xMSICFGADDR_PPN_HHX_MASK(__hhxw) << \
> > + APLIC_xMSICFGADDR_PPN_HHX_SHIFT(__hhxs))
> > +
> > +#define APLIC_IRQBITS_PER_REG 32
> > +
> > +#define APLIC_SETIP_BASE 0x1c00
> > +#define APLIC_SETIPNUM 0x1cdc
> > +
> > +#define APLIC_CLRIP_BASE 0x1d00
> > +#define APLIC_CLRIPNUM 0x1ddc
> > +
> > +#define APLIC_SETIE_BASE 0x1e00
> > +#define APLIC_SETIENUM 0x1edc
> > +
> > +#define APLIC_CLRIE_BASE 0x1f00
> > +#define APLIC_CLRIENUM 0x1fdc
> > +
> > +#define APLIC_SETIPNUM_LE 0x2000
> > +#define APLIC_SETIPNUM_BE 0x2004
> > +
> > +#define APLIC_GENMSI 0x3000
> > +
> > +#define APLIC_TARGET_BASE 0x3004
> > +#define APLIC_TARGET_HART_IDX_SHIFT 18
> > +#define APLIC_TARGET_HART_IDX_MASK 0x3fff
> > +#define APLIC_TARGET_GUEST_IDX_SHIFT 12
> > +#define APLIC_TARGET_GUEST_IDX_MASK 0x3f
> > +#define APLIC_TARGET_IPRIO_MASK 0xff
> > +#define APLIC_TARGET_EIID_MASK 0x7ff
> > +
> > +#define APLIC_IDC_BASE 0x4000
> > +#define APLIC_IDC_SIZE 32
> > +
> > +#define APLIC_IDC_IDELIVERY 0x00
> > +
> > +#define APLIC_IDC_IFORCE 0x04
> > +
> > +#define APLIC_IDC_ITHRESHOLD 0x08
> > +
> > +#define APLIC_IDC_TOPI 0x18
> > +#define APLIC_IDC_TOPI_ID_SHIFT 16
> > +#define APLIC_IDC_TOPI_ID_MASK 0x3ff
> > +#define APLIC_IDC_TOPI_PRIO_MASK 0xff
> > +
> > +#define APLIC_IDC_CLAIMI 0x1c
> > +
> > +#endif
> > --
> > 2.34.1
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> Thanks,
> Andy
On Mon, 19 Feb 2024 15:50:36 +0000,
Biju Das <[email protected]> wrote:
>
> > Now that the GIC-v3 callback can handle invocation with a fwspec parameter
> > count of 0 lift the restriction in the core code and invoke select()
> > unconditionally when the domain provides it.
>
> This patch breaks on RZ/G2L SMARC EVK as of_phandle_args_to_fwspec count()
> is called after irq_find_matching_fwspec() is causing fwspec->param_count=0
> and this results in boot failure as the patch removes the check.
>
> Maybe we need to revert this patch or fix the fundamental issue.
>
> Cheers,
> Biju
> ---
> kernel/irq/irqdomain.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
> index 0bdef4f..8fee379 100644
> --- a/kernel/irq/irqdomain.c
> +++ b/kernel/irq/irqdomain.c
> @@ -448,7 +448,7 @@ struct irq_domain *irq_find_matching_fwspec(struct irq_fwspec *fwspec,
> */
> mutex_lock(&irq_domain_mutex);
> list_for_each_entry(h, &irq_domain_list, link) {
> - if (h->ops->select && fwspec->param_count)
> + if (h->ops->select)
> rc = h->ops->select(h, fwspec, bus_token);
> else if (h->ops->match)
> rc = h->ops->match(h, to_of_node(fwnode), bus_token);
>
>
Dmitry posted his take on this at [1], and I have suggested another
possible fix in my reply.
Could you please give both patches a go?
Thanks,
M.
[1] https://lore.kernel.org/r/[email protected]
--
Without deviation from the norm, progress is not possible.
> Now that the GIC-v3 callback can handle invocation with a fwspec parameter
> count of 0 lift the restriction in the core code and invoke select()
> unconditionally when the domain provides it.
This patch breaks on RZ/G2L SMARC EVK as of_phandle_args_to_fwspec count()
is called after irq_find_matching_fwspec() is causing fwspec->param_count=0
and this results in boot failure as the patch removes the check.
Maybe we need to revert this patch or fix the fundamental issue.
Cheers,
Biju
---
kernel/irq/irqdomain.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 0bdef4f..8fee379 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -448,7 +448,7 @@ struct irq_domain *irq_find_matching_fwspec(struct irq_fwspec *fwspec,
*/
mutex_lock(&irq_domain_mutex);
list_for_each_entry(h, &irq_domain_list, link) {
- if (h->ops->select && fwspec->param_count)
+ if (h->ops->select)
rc = h->ops->select(h, fwspec, bus_token);
else if (h->ops->match)
rc = h->ops->match(h, to_of_node(fwnode), bus_token);
Hi Marc Zyngier,
Thanks for the feedback.
> -----Original Message-----
> From: Marc Zyngier <[email protected]>
> Sent: Monday, February 19, 2024 3:57 PM
> Subject: Re: [tip: irq/msi] genirq/irqdomain: Remove the param count
> restriction from select()
>
> On Mon, 19 Feb 2024 15:50:36 +0000,
> Biju Das <[email protected]> wrote:
> >
> > > Now that the GIC-v3 callback can handle invocation with a fwspec
> > > parameter count of 0 lift the restriction in the core code and
> > > invoke select() unconditionally when the domain provides it.
> >
> > This patch breaks on RZ/G2L SMARC EVK as of_phandle_args_to_fwspec
> > count() is called after irq_find_matching_fwspec() is causing
> > fwspec->param_count=0 and this results in boot failure as the patch
> removes the check.
> >
> > Maybe we need to revert this patch or fix the fundamental issue.
> >
> > Cheers,
> > Biju
> > ---
> > kernel/irq/irqdomain.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index
> > 0bdef4f..8fee379 100644
> > --- a/kernel/irq/irqdomain.c
> > +++ b/kernel/irq/irqdomain.c
> > @@ -448,7 +448,7 @@ struct irq_domain *irq_find_matching_fwspec(struct
> irq_fwspec *fwspec,
> > */
> > mutex_lock(&irq_domain_mutex);
> > list_for_each_entry(h, &irq_domain_list, link) {
> > - if (h->ops->select && fwspec->param_count)
> > + if (h->ops->select)
> > rc = h->ops->select(h, fwspec, bus_token);
> > else if (h->ops->match)
> > rc = h->ops->match(h, to_of_node(fwnode), bus_token);
> >
> >
>
> Dmitry posted his take on this at [1], and I have suggested another
> possible fix in my reply.
>
> Could you please give both patches a go?
I tested and confirm both the patches looks good.
Cheers,
Biju
Hi Marc and Dmitry,
> -----Original Message-----
> From: Biju Das
> Sent: Monday, February 19, 2024 4:39 PM
> Subject: RE: [tip: irq/msi] genirq/irqdomain: Remove the param count
> restriction from select()
>
> Hi Marc Zyngier,
>
> Thanks for the feedback.
>
> > -----Original Message-----
> > From: Marc Zyngier <[email protected]>
> > Sent: Monday, February 19, 2024 3:57 PM
> > Subject: Re: [tip: irq/msi] genirq/irqdomain: Remove the param count
> > restriction from select()
> >
> > On Mon, 19 Feb 2024 15:50:36 +0000,
> > Biju Das <[email protected]> wrote:
> > >
> > > > Now that the GIC-v3 callback can handle invocation with a fwspec
> > > > parameter count of 0 lift the restriction in the core code and
> > > > invoke select() unconditionally when the domain provides it.
> > >
> > > This patch breaks on RZ/G2L SMARC EVK as of_phandle_args_to_fwspec
> > > count() is called after irq_find_matching_fwspec() is causing
> > > fwspec->param_count=0 and this results in boot failure as the patch
> > removes the check.
> > >
> > > Maybe we need to revert this patch or fix the fundamental issue.
> > >
> > > Cheers,
> > > Biju
> > > ---
> > > kernel/irq/irqdomain.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index
> > > 0bdef4f..8fee379 100644
> > > --- a/kernel/irq/irqdomain.c
> > > +++ b/kernel/irq/irqdomain.c
> > > @@ -448,7 +448,7 @@ struct irq_domain
> > > *irq_find_matching_fwspec(struct
> > irq_fwspec *fwspec,
> > > */
> > > mutex_lock(&irq_domain_mutex);
> > > list_for_each_entry(h, &irq_domain_list, link) {
> > > - if (h->ops->select && fwspec->param_count)
> > > + if (h->ops->select)
> > > rc = h->ops->select(h, fwspec, bus_token);
> > > else if (h->ops->match)
> > > rc = h->ops->match(h, to_of_node(fwnode), bus_token);
> > >
> > >
> >
> > Dmitry posted his take on this at [1], and I have suggested another
> > possible fix in my reply.
> >
> > Could you please give both patches a go?
>
> I tested and confirm both the patches looks good.
Please let me know the details of the final patch, so that I can test and
add tested-by tag for Renesas RZ/G2L platforms.
Cheers,
Biju
On Sat, Feb 17, 2024 at 2:35 AM Thomas Gleixner <[email protected]> wrote:
>
> Anup!
>
> On Thu, Feb 15 2024 at 20:59, Thomas Gleixner wrote:
> > I'm going over the rest of the series after I dealt with my other patch
> > backlog.
>
> Aside of the nitpicks I had, this looks pretty reasonable.
Thanks for your review.
I have sent v13 series. Please have a look.
Regards,
Anup
>
> Thanks,
>
> tglx
On Mon, Feb 19 2024 at 15:56, Marc Zyngier wrote:
>> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
>> index 0bdef4f..8fee379 100644
>> --- a/kernel/irq/irqdomain.c
>> +++ b/kernel/irq/irqdomain.c
>> @@ -448,7 +448,7 @@ struct irq_domain *irq_find_matching_fwspec(struct irq_fwspec *fwspec,
>> */
>> mutex_lock(&irq_domain_mutex);
>> list_for_each_entry(h, &irq_domain_list, link) {
>> - if (h->ops->select && fwspec->param_count)
>> + if (h->ops->select)
>> rc = h->ops->select(h, fwspec, bus_token);
>> else if (h->ops->match)
>> rc = h->ops->match(h, to_of_node(fwnode), bus_token);
>>
>>
>
> Dmitry posted his take on this at [1], and I have suggested another
> possible fix in my reply.
Your core side DOMAIN_BUS_ANY variant makes a lot of sense. Can you
please post a proper patch for that?
Aside of this I just noticed that we need the below too.
Thanks,
tglx
---
Subject: irqchip/imx-intmux: Handle pure domain searches correctly
From: Thomas Gleixner <[email protected]>
Date: Tue, 20 Feb 2024 09:46:19 +0100
The removal of the paremeter count restriction in the core code to allow
pure domain token based select() decisions broke the IMX intmux select
callback as that unconditioally expects that there is a parameter.
Add the missing check for zero parameter count and the token match.
Fixes: de1ff306dcf4 ("genirq/irqdomain: Remove the param count restriction from select()")
Signed-off-by: Thomas Gleixner <[email protected]>
---
drivers/irqchip/irq-imx-intmux.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/drivers/irqchip/irq-imx-intmux.c
+++ b/drivers/irqchip/irq-imx-intmux.c
@@ -166,6 +166,10 @@ static int imx_intmux_irq_select(struct
if (fwspec->fwnode != d->fwnode)
return false;
+ /* Handle pure domain searches */
+ if (!fwspec->param_count)
+ return d->bus_token == bus_token;
+
return irqchip_data->chanidx == fwspec->param[1];
}
The following commit has been merged into the irq/msi branch of tip:
Commit-ID: 34da27aa8956d3a75c7556a59c9c7cfd0b3f18ab
Gitweb: https://git.kernel.org/tip/34da27aa8956d3a75c7556a59c9c7cfd0b3f18ab
Author: Thomas Gleixner <[email protected]>
AuthorDate: Tue, 20 Feb 2024 09:46:19 +01:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Tue, 20 Feb 2024 17:30:57 +01:00
irqchip/imx-intmux: Handle pure domain searches correctly
The removal of the paremeter count restriction in the core code to allow
pure domain token based select() decisions broke the IMX intmux select
callback as that unconditioally expects that there is a parameter.
Add the missing check for zero parameter count and the token match.
Fixes: de1ff306dcf4 ("genirq/irqdomain: Remove the param count restriction from select()")
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/87ttm3ikok.ffs@tglx
---
drivers/irqchip/irq-imx-intmux.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/irqchip/irq-imx-intmux.c b/drivers/irqchip/irq-imx-intmux.c
index aa041e4..65084c7 100644
--- a/drivers/irqchip/irq-imx-intmux.c
+++ b/drivers/irqchip/irq-imx-intmux.c
@@ -166,6 +166,10 @@ static int imx_intmux_irq_select(struct irq_domain *d, struct irq_fwspec *fwspec
if (fwspec->fwnode != d->fwnode)
return false;
+ /* Handle pure domain searches */
+ if (!fwspec->param_count)
+ return d->bus_token == bus_token;
+
return irqchip_data->chanidx == fwspec->param[1];
}
On 27/01/2024 16:17, Anup Patel wrote:
> From: Thomas Gleixner <[email protected]>
>
> Now that the GIC-v3 callback can handle invocation with a fwspec parameter
> count of 0 lift the restriction in the core code and invoke select()
> unconditionally when the domain provides it.
>
> Preparatory change for per device MSI domains.
>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Signed-off-by: Anup Patel <[email protected]>
> ---
Hi Thomas/Anup
Currently when booting the kernel against next-master(next-20240222)
with Arm64 on Qualcomm boards RB5/DB845C, the kernel is resulting in
boot failures for our CI. I can send the full logs if required. Most
other boards seem to be fine.
A bisect (full log below) identified this patch as introducing the
failure. Bisected it on the tag "next-20240220" at repo
"https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
This works fine on Linux v6.8-rc5
Sample log from failure against run on RB5:
------
07:03:06.985934 <6>[ 1.727034] Trying to probe devices needed for
running init ...
07:03:16.905972 <6>[ 11.624040] platform 998000.serial: deferred
probe pending: platform: wait for supplier
/soc@0/pinctrl@f100000/qup-uart6-default-state
07:03:16.906250 <6>[ 11.636743] platform 1c08000.pcie: deferred probe
pending: platform: wait for supplier
/soc@0/pinctrl@f100000/pcie1-default-state
07:03:16.906400 <6>[ 11.648976] platform a90000.serial: deferred
probe pending: (reason unknown)
07:03:16.945462 <6>[ 11.656490] platform 1c10000.pcie: deferred probe
pending: platform: wait for supplier
/soc@0/pinctrl@f100000/pcie2-default-state
07:03:16.950476 <6>[ 11.668723] platform c440000.spmi: deferred probe
pending: platform: supplier b220000.interrupt-controller not ready
07:03:16.950635 <6>[ 11.679800] platform a6f8800.usb: deferred probe
pending: platform: supplier b220000.interrupt-controller not ready
07:03:16.950781 <6>[ 11.690778] platform a8f8800.usb: deferred probe
pending: platform: supplier b220000.interrupt-controller not ready
07:03:16.950923 <6>[ 11.701761] platform leds: deferred probe
pending: leds-gpio: Failed to get GPIO '/leds/led-user4'
07:03:16.989720 <6>[ 11.711226] platform f100000.pinctrl: deferred
probe pending: platform: supplier b220000.interrupt-controller not ready
07:03:16.994769 <6>[ 11.722567] platform 18591000.cpufreq: deferred
probe pending: qcom-cpufreq-hw: Failed to find icc paths
07:03:16.994929 <6>[ 11.732573] platform
b220000.interrupt-controller: deferred probe pending: (reason unknown)
07:03:16.995076 <4>[ 11.741438] qnoc-sm8250 1500000.interconnect:
sync_state() pending due to 1d84000.ufshc
07:03:17.034092 <4>[ 11.749935] qnoc-sm8250 163d000.interconnect:
sync_state() pending due to 1d84000.ufshc
07:03:17.034331 <4>[ 11.758430] qnoc-sm8250 16e0000.interconnect:
sync_state() pending due to 1d84000.ufshc
07:03:17.039326 <4>[ 11.766916] qnoc-sm8250 163d000.interconnect:
sync_state() pending due to 1dfa000.crypto
07:03:17.039482 <4>[ 11.775495] qnoc-sm8250 1700000.interconnect:
sync_state() pending due to 1dfa000.crypto
07:03:17.039623 <4>[ 11.784081] qnoc-sm8250 163d000.interconnect:
sync_state() pending due to 9091000.pmu
07:03:17.039759 <4>[ 11.792389] qnoc-sm8250 9100000.interconnect:
sync_state() pending due to 90b6400.pmu
07:03:17.078467 <4>[ 11.800705] qnoc-sm8250 9100000.interconnect:
sync_state() pending due to 1d84000.ufshc
07:03:17.083560 <4>[ 11.809198] qnoc-sm8250 1500000.interconnect:
sync_state() pending due to a6f8800.usb
07:03:17.083720 <4>[ 11.817508] qnoc-sm8250 9100000.interconnect:
sync_state() pending due to a6f8800.usb
07:03:17.083866 <4>[ 11.825825] qnoc-sm8250 163d000.interconnect:
sync_state() pending due to a6f8800.usb
07:03:17.084006 <4>[ 11.834140] qnoc-sm8250 16e0000.interconnect:
sync_state() pending due to a6f8800.usb
07:03:17.122721 <4>[ 11.842456] qnoc-sm8250 1500000.interconnect:
sync_state() pending due to a8f8800.usb
07:03:17.127829 <4>[ 11.850766] qnoc-sm8250 9100000.interconnect:
sync_state() pending due to a8f8800.usb
07:03:17.127989 <4>[ 11.859076] qnoc-sm8250 163d000.interconnect:
sync_state() pending due to a8f8800.usb
07:03:17.128144 <4>[ 11.867388] qnoc-sm8250 16e0000.interconnect:
sync_state() pending due to a8f8800.usb
07:03:17.128286 <4>[ 11.875706] qnoc-sm8250 163d000.interconnect:
sync_state() pending due to aa00000.video-codec
07:03:17.167089 <4>[ 11.884733] qnoc-sm8250 1740000.interconnect:
sync_state() pending due to aa00000.video-codec
07:03:17.172232 <4>[ 11.893760] qnoc-sm8250 1500000.interconnect:
sync_state() pending due to aa00000.video-codec
07:03:17.172404 <4>[ 11.902780] qnoc-sm8250 9100000.interconnect:
sync_state() pending due to aa00000.video-codec
07:03:17.172564 <4>[ 11.911805] qnoc-sm8250 163d000.interconnect:
sync_state() pending due to ae00000.display-subsystem
07:03:17.172705 <4>[ 11.921359] qnoc-sm8250 1740000.interconnect:
sync_state() pending due to ae00000.display-subsystem
07:03:17.211346 <4>[ 11.930932] qcom-rpmhpd
18200000.rsc:power-controller: sync_state() pending due to
17300000.remoteproc
07:03:17.216527 <4>[ 11.940758] qcom-rpmhpd
18200000.rsc:power-controller: sync_state() pending due to
ae00000.display-subsystem
07:03:17.216694 <4>[ 11.951113] qcom-rpmhpd
18200000.rsc:power-controller: sync_state() pending due to
aa00000.video-codec
07:03:17.216840 <4>[ 11.960935] qcom-rpmhpd
18200000.rsc:power-controller: sync_state() pending due to 8804000.mmc
07:03:17.255721 <4>[ 11.970059] qcom-rpmhpd
18200000.rsc:power-controller: sync_state() pending due to
8300000.remoteproc
07:03:17.255962 <4>[ 11.979795] qnoc-sm8250 163d000.interconnect:
sync_state() pending due to 884000.i2c
07:03:17.261054 <4>[ 11.988021] qnoc-sm8250 16e0000.interconnect:
sync_state() pending due to 884000.i2c
07:03:17.261220 <4>[ 11.996242] qnoc-sm8250 1500000.interconnect:
sync_state() pending due to 884000.i2c
07:03:17.261366 <4>[ 12.004465] qnoc-sm8250 9100000.interconnect:
sync_state() pending due to 884000.i2c
07:03:17.261506 <4>[ 12.012691] qnoc-sm8250 interconnect-qup-virt:
sync_state() pending due to 884000.i2c
07:03:17.300099 <4>[ 12.021008] qcom-rpmhpd
18200000.rsc:power-controller: sync_state() pending due to 884000.i2c
07:03:17.305306 <4>[ 12.030029] qnoc-sm8250 163d000.interconnect:
sync_state() pending due to 980000.spi
07:03:17.305467 <4>[ 12.038254] qnoc-sm8250 1700000.interconnect:
sync_state() pending due to 980000.spi
07:03:17.305613 <4>[ 12.046479] qnoc-sm8250 1500000.interconnect:
sync_state() pending due to 980000.spi
07:03:17.305754 <4>[ 12.054705] qnoc-sm8250 9100000.interconnect:
sync_state() pending due to 980000.spi
07:03:17.344314 <4>[ 12.062927] qnoc-sm8250 interconnect-qup-virt:
sync_state() pending due to 980000.spi
07:03:17.349541 <4>[ 12.071244] qcom-rpmhpd
18200000.rsc:power-controller: sync_state() pending due to 980000.spi
07:03:17.349701 <4>[ 12.080272] qnoc-sm8250 163d000.interconnect:
sync_state() pending due to 990000.i2c
07:03:17.349846 <4>[ 12.088494] qnoc-sm8250 1700000.interconnect:
sync_state() pending due to 990000.i2c
07:03:17.349986 <4>[ 12.096713] qnoc-sm8250 1500000.interconnect:
sync_state() pending due to 990000.i2c
07:03:17.388758 <4>[ 12.104937] qnoc-sm8250 9100000.interconnect:
sync_state() pending due to 990000.i2c
07:03:17.388993 <4>[ 12.113158] qnoc-sm8250 interconnect-qup-virt:
sync_state() pending due to 990000.i2c
07:03:17.394156 <4>[ 12.121473] qcom-rpmhpd
18200000.rsc:power-controller: sync_state() pending due to 990000.i2c
07:03:17.394314 <4>[ 12.130504] qnoc-sm8250 163d000.interconnect:
sync_state() pending due to 994000.i2c
07:03:17.394458 <4>[ 12.138733] qnoc-sm8250 1700000.interconnect:
sync_state() pending due to 994000.i2c
07:03:17.394598 <4>[ 12.146958] qnoc-sm8250 1500000.interconnect:
sync_state() pending due to 994000.i2c
07:03:17.433035 <4>[ 12.155179] qnoc-sm8250 9100000.interconnect:
sync_state() pending due to 994000.i2c
07:03:17.438301 <4>[ 12.163405] qnoc-sm8250 interconnect-qup-virt:
sync_state() pending due to 994000.i2c
07:03:17.438461 <4>[ 12.171722] qcom-rpmhpd
18200000.rsc:power-controller: sync_state() pending due to 994000.i2c
07:03:17.438607 <4>[ 12.180742] qnoc-sm8250 1500000.interconnect:
sync_state() pending due to 998000.serial
07:03:17.438748 <4>[ 12.189235] qnoc-sm8250 9100000.interconnect:
sync_state() pending due to 998000.serial
07:03:17.477464 <4>[ 12.197719] qnoc-sm8250 interconnect-qup-virt:
sync_state() pending due to 998000.serial
07:03:17.482759 <4>[ 12.206299] qcom-rpmhpd
18200000.rsc:power-controller: sync_state() pending due to 998000.serial
07:03:17.482918 <4>[ 12.215592] qnoc-sm8250 1500000.interconnect:
sync_state() pending due to a90000.serial
07:03:17.483065 <4>[ 12.224084] qnoc-sm8250 9100000.interconnect:
sync_state() pending due to a90000.serial
07:03:17.483207 <4>[ 12.232576] qnoc-sm8250 interconnect-qup-virt:
sync_state() pending due to a90000.serial
07:03:17.503624 <4>[ 12.241158] qcom-rpmhpd
18200000.rsc:power-controller: sync_state() pending due to a90000.serial
------
Bisect log:
------
git bisect start
# good: [b401b621758e46812da61fa58a67c3fd8d91de0d] Linux 6.8-rc5
git bisect good b401b621758e46812da61fa58a67c3fd8d91de0d
# bad: [2d5c7b7eb345249cb34d42cbc2b97b4c57ea944e] Add linux-next
specific files for 20240220
git bisect bad 2d5c7b7eb345249cb34d42cbc2b97b4c57ea944e
# good: [d0427d6bc95f2dae2595859f39c2de343479c909] Merge branch
'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
git bisect good d0427d6bc95f2dae2595859f39c2de343479c909
# good: [4c165a847139a6564d28e0f4d8e9fc9c67f22359] Merge branch
'for-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git
git bisect good 4c165a847139a6564d28e0f4d8e9fc9c67f22359
# bad: [4dfc8ee8540b799d604551c41c82a9e07089e20e] Merge branch
'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
git bisect bad 4dfc8ee8540b799d604551c41c82a9e07089e20e
# bad: [1fad63263f1650f15d5bd174391a53d3e600c327] Merge branch
'rcu/next' of
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
git bisect bad 1fad63263f1650f15d5bd174391a53d3e600c327
# bad: [0ced0254dca0bc06b09cfe31d6af411856379ea0] Merge branch into
tip/master: 'x86/vdso'
git bisect bad 0ced0254dca0bc06b09cfe31d6af411856379ea0
# good: [218b13db258c091e646857fc962ef45fe2163054] Merge branch
'x86/core' into x86/merge, to ease integration testing
git bisect good 218b13db258c091e646857fc962ef45fe2163054
# bad: [0b1902960678524b91f9ee3c94fc6561cfa1ead9] Merge branch into
tip/master: 'timers/ptp'
git bisect bad 0b1902960678524b91f9ee3c94fc6561cfa1ead9
# bad: [fa4d750326d50e3cc7801ada3d641daf14a4cb9d] Merge branch into
tip/master: 'irq/msi'
git bisect bad fa4d750326d50e3cc7801ada3d641daf14a4cb9d
# good: [9e04f6432c7ebaf33d5ce9a6e15bc544da316e54] Merge branch into
tip/master: 'irq/core'
git bisect good 9e04f6432c7ebaf33d5ce9a6e15bc544da316e54
# bad: [1a4671ff7a903e87e4e76213e200bb8bcfa942e4] platform-msi: Remove
unused interfaces
git bisect bad 1a4671ff7a903e87e4e76213e200bb8bcfa942e4
# bad: [ac81e94ab001c2882e89c9b61417caea64b800df] genirq/msi: Extend
msi_parent_ops
git bisect bad ac81e94ab001c2882e89c9b61417caea64b800df
# bad: [de1ff306dcf4546d6a8863b1f956335e0d3fbb9b] genirq/irqdomain:
Remove the param count restriction from select()
git bisect bad de1ff306dcf4546d6a8863b1f956335e0d3fbb9b
# good: [15137825100422c4c393c87af5aa5a8fa297b1f3] irqchip/gic-v3: Make
gic_irq_domain_select() robust for zero parameter count
git bisect good 15137825100422c4c393c87af5aa5a8fa297b1f3
# first bad commit: [de1ff306dcf4546d6a8863b1f956335e0d3fbb9b]
genirq/irqdomain: Remove the param count restriction from select()
------
Thanks,
Aishwarya
On Thu, 22 Feb 2024 13:01:32 +0000,
Aishwarya TCV <[email protected]> wrote:
>
>
>
> On 27/01/2024 16:17, Anup Patel wrote:
> > From: Thomas Gleixner <[email protected]>
> >
> > Now that the GIC-v3 callback can handle invocation with a fwspec parameter
> > count of 0 lift the restriction in the core code and invoke select()
> > unconditionally when the domain provides it.
> >
> > Preparatory change for per device MSI domains.
> >
> > Signed-off-by: Thomas Gleixner <[email protected]>
> > Signed-off-by: Anup Patel <[email protected]>
> > ---
>
> Hi Thomas/Anup
>
> Currently when booting the kernel against next-master(next-20240222)
> with Arm64 on Qualcomm boards RB5/DB845C, the kernel is resulting in
> boot failures for our CI. I can send the full logs if required. Most
> other boards seem to be fine.
>
> A bisect (full log below) identified this patch as introducing the
> failure. Bisected it on the tag "next-20240220" at repo
> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
>
> This works fine on Linux v6.8-rc5
Can you please try [1]?
M.
[1] https://lore.kernel.org/linux-kernel/[email protected]
--
Without deviation from the norm, progress is not possible.
On 22/02/2024 16:28, Marc Zyngier wrote:
> On Thu, 22 Feb 2024 13:01:32 +0000,
> Aishwarya TCV <[email protected]> wrote:
>>
>>
>>
>> On 27/01/2024 16:17, Anup Patel wrote:
>>> From: Thomas Gleixner <[email protected]>
>>>
>>> Now that the GIC-v3 callback can handle invocation with a fwspec parameter
>>> count of 0 lift the restriction in the core code and invoke select()
>>> unconditionally when the domain provides it.
>>>
>>> Preparatory change for per device MSI domains.
>>>
>>> Signed-off-by: Thomas Gleixner <[email protected]>
>>> Signed-off-by: Anup Patel <[email protected]>
>>> ---
>>
>> Hi Thomas/Anup
>>
>> Currently when booting the kernel against next-master(next-20240222)
>> with Arm64 on Qualcomm boards RB5/DB845C, the kernel is resulting in
>> boot failures for our CI. I can send the full logs if required. Most
>> other boards seem to be fine.
>>
>> A bisect (full log below) identified this patch as introducing the
>> failure. Bisected it on the tag "next-20240220" at repo
>> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
>>
>> This works fine on Linux v6.8-rc5
>
> Can you please try [1]?
>
> M.
>
> [1] https://lore.kernel.org/linux-kernel/[email protected]
>
With the patch[1] applied on next-master(next-20240222), successfully
tested booting the kernel with Arm64 on Qualcomm boards RB5/DB845C.
Confirming that the patch is resolving the boot issue on RB5/DB845C
Thanks
Aishwarya
Dear All,
On 27.01.2024 17:17, Anup Patel wrote:
> From: Thomas Gleixner <[email protected]>
>
> Now that the GIC-v3 callback can handle invocation with a fwspec parameter
> count of 0 lift the restriction in the core code and invoke select()
> unconditionally when the domain provides it.
>
> Preparatory change for per device MSI domains.
>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Signed-off-by: Anup Patel <[email protected]>
This patch landed recently in linux-next (next-20240221) as commit
de1ff306dcf4 ("genirq/irqdomain: Remove the param count restriction from
select()"). I've noticed that it breaks booting of Qualcomm's Robotics
RB5 ARM64 board (arch/arm64/boot/dts/qcom/qrb5165-rb5.dts). Booting
freezes after "clk: Disabling unused clocks", but this is probably a
consequence of some earlier failure. Reverting $subject on top of
next-20240221 fixes this problem. Let me know how can I help debugging
this issue.
> ---
> kernel/irq/irqdomain.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
> index 0bdef4fe925b..8fee37918195 100644
> --- a/kernel/irq/irqdomain.c
> +++ b/kernel/irq/irqdomain.c
> @@ -448,7 +448,7 @@ struct irq_domain *irq_find_matching_fwspec(struct irq_fwspec *fwspec,
> */
> mutex_lock(&irq_domain_mutex);
> list_for_each_entry(h, &irq_domain_list, link) {
> - if (h->ops->select && fwspec->param_count)
> + if (h->ops->select)
> rc = h->ops->select(h, fwspec, bus_token);
> else if (h->ops->match)
> rc = h->ops->match(h, to_of_node(fwnode), bus_token);
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Hi Marek Szyprowski,
> -----Original Message-----
> From: linux-arm-kernel <[email protected]> On
> Behalf Of Marek Szyprowski
> Sent: Friday, February 23, 2024 10:23 AM
> Subject: Re: [PATCH v12 02/25] genirq/irqdomain: Remove the param count
> restriction from select()
>
> Dear All,
>
> On 27.01.2024 17:17, Anup Patel wrote:
> > From: Thomas Gleixner <[email protected]>
> >
> > Now that the GIC-v3 callback can handle invocation with a fwspec
> > parameter count of 0 lift the restriction in the core code and invoke
> > select() unconditionally when the domain provides it.
> >
> > Preparatory change for per device MSI domains.
> >
> > Signed-off-by: Thomas Gleixner <[email protected]>
> > Signed-off-by: Anup Patel <[email protected]>
>
>
> This patch landed recently in linux-next (next-20240221) as commit
> de1ff306dcf4 ("genirq/irqdomain: Remove the param count restriction from
> select()"). I've noticed that it breaks booting of Qualcomm's Robotics
> RB5 ARM64 board (arch/arm64/boot/dts/qcom/qrb5165-rb5.dts). Booting
> freezes after "clk: Disabling unused clocks", but this is probably a
> consequence of some earlier failure. Reverting $subject on top of
> next-20240221 fixes this problem. Let me know how can I help debugging
> this issue.
>
>
> > ---
> > kernel/irq/irqdomain.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index
> > 0bdef4fe925b..8fee37918195 100644
> > --- a/kernel/irq/irqdomain.c
> > +++ b/kernel/irq/irqdomain.c
> > @@ -448,7 +448,7 @@ struct irq_domain *irq_find_matching_fwspec(struct
> irq_fwspec *fwspec,
> > */
> > mutex_lock(&irq_domain_mutex);
> > list_for_each_entry(h, &irq_domain_list, link) {
> > - if (h->ops->select && fwspec->param_count)
> > + if (h->ops->select)
> > rc = h->ops->select(h, fwspec, bus_token);
> > else if (h->ops->match)
> > rc = h->ops->match(h, to_of_node(fwnode), bus_token);
This patch looks reverted on todays's next. But there was a fix for fixing the issue you mentioned [1]
[1] https://lore.kernel.org/all/170844679345.398.17551290253758129895.tip-bot2@tip-bot2/
Cheers,
Biju
On 23.02.2024 11:45, Biju Das wrote:
>> On 27.01.2024 17:17, Anup Patel wrote:
>>> From: Thomas Gleixner <[email protected]>
>>>
>>> Now that the GIC-v3 callback can handle invocation with a fwspec
>>> parameter count of 0 lift the restriction in the core code and invoke
>>> select() unconditionally when the domain provides it.
>>>
>>> Preparatory change for per device MSI domains.
>>>
>>> Signed-off-by: Thomas Gleixner <[email protected]>
>>> Signed-off-by: Anup Patel <[email protected]>
>>
>> This patch landed recently in linux-next (next-20240221) as commit
>> de1ff306dcf4 ("genirq/irqdomain: Remove the param count restriction from
>> select()"). I've noticed that it breaks booting of Qualcomm's Robotics
>> RB5 ARM64 board (arch/arm64/boot/dts/qcom/qrb5165-rb5.dts). Booting
>> freezes after "clk: Disabling unused clocks", but this is probably a
>> consequence of some earlier failure. Reverting $subject on top of
>> next-20240221 fixes this problem. Let me know how can I help debugging
>> this issue.
>>
>>
>>> ---
>>> kernel/irq/irqdomain.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index
>>> 0bdef4fe925b..8fee37918195 100644
>>> --- a/kernel/irq/irqdomain.c
>>> +++ b/kernel/irq/irqdomain.c
>>> @@ -448,7 +448,7 @@ struct irq_domain *irq_find_matching_fwspec(struct
>> irq_fwspec *fwspec,
>>> */
>>> mutex_lock(&irq_domain_mutex);
>>> list_for_each_entry(h, &irq_domain_list, link) {
>>> - if (h->ops->select && fwspec->param_count)
>>> + if (h->ops->select)
>>> rc = h->ops->select(h, fwspec, bus_token);
>>> else if (h->ops->match)
>>> rc = h->ops->match(h, to_of_node(fwnode), bus_token);
> This patch looks reverted on todays's next. But there was a fix for fixing the issue you mentioned [1]
>
> [1] https://lore.kernel.org/all/170844679345.398.17551290253758129895.tip-bot2@tip-bot2/
Thanks! Today's next seems to be broken on ARM64 (doesn't compile here),
so I've missed it.
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Hi Marek Szyprowski,
> -----Original Message-----
> From: Marek Szyprowski <[email protected]>
> Sent: Friday, February 23, 2024 10:57 AM
> Subject: Re: [PATCH v12 02/25] genirq/irqdomain: Remove the param count
> restriction from select()
>
> On 23.02.2024 11:45, Biju Das wrote:
> >> On 27.01.2024 17:17, Anup Patel wrote:
> >>> From: Thomas Gleixner <[email protected]>
> >>>
> >>> Now that the GIC-v3 callback can handle invocation with a fwspec
> >>> parameter count of 0 lift the restriction in the core code and
> >>> invoke
> >>> select() unconditionally when the domain provides it.
> >>>
> >>> Preparatory change for per device MSI domains.
> >>>
> >>> Signed-off-by: Thomas Gleixner <[email protected]>
> >>> Signed-off-by: Anup Patel <[email protected]>
> >>
> >> This patch landed recently in linux-next (next-20240221) as commit
> >> de1ff306dcf4 ("genirq/irqdomain: Remove the param count restriction
> >> from select()"). I've noticed that it breaks booting of Qualcomm's
> >> Robotics
> >> RB5 ARM64 board (arch/arm64/boot/dts/qcom/qrb5165-rb5.dts). Booting
> >> freezes after "clk: Disabling unused clocks", but this is probably a
> >> consequence of some earlier failure. Reverting $subject on top of
> >> next-20240221 fixes this problem. Let me know how can I help
> >> debugging this issue.
> >>
> >>
> >>> ---
> >>> kernel/irq/irqdomain.c | 2 +-
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index
> >>> 0bdef4fe925b..8fee37918195 100644
> >>> --- a/kernel/irq/irqdomain.c
> >>> +++ b/kernel/irq/irqdomain.c
> >>> @@ -448,7 +448,7 @@ struct irq_domain
> >>> *irq_find_matching_fwspec(struct
> >> irq_fwspec *fwspec,
> >>> */
> >>> mutex_lock(&irq_domain_mutex);
> >>> list_for_each_entry(h, &irq_domain_list, link) {
> >>> - if (h->ops->select && fwspec->param_count)
> >>> + if (h->ops->select)
> >>> rc = h->ops->select(h, fwspec, bus_token);
> >>> else if (h->ops->match)
> >>> rc = h->ops->match(h, to_of_node(fwnode),
> bus_token);
> > This patch looks reverted on todays's next. But there was a fix for
> > fixing the issue you mentioned [1]
> >
> > [1]
> > https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore
> > .kernel.org%2Fall%2F170844679345.398.17551290253758129895.tip-bot2%40t
> > ip-bot2%2F&data=05%7C02%7Cbiju.das.jz%40bp.renesas.com%7Cffce754120ba4
> > ff0f8d708dc345e1c1a%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C63844
> > 2825998732581%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2lu
> > MzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=lYLVpa4tz2fEU1Jc
> > DDGUF6eigi2YJuGDouMXrJSlibo%3D&reserved=0
>
> Thanks! Today's next seems to be broken on ARM64 (doesn't compile here),
> so I've missed it.
FYI, ARM64 defconfig works for me.
Linux smarc-rzg2l 6.8.0-rc5-next-20240223-gfebe82167ab7 #1430 SMP PREEMPT Fri Feb 23 07:41:52 GMT 2024 aarch64 GNU/Linux
Cheers,
Biju