2022-11-14 10:07:19

by Anup Patel

[permalink] [raw]
Subject: [PATCH v11 0/7] RISC-V IPI Improvements

This series aims to improve IPI support in Linux RISC-V in following ways:
1) Treat IPIs as normal per-CPU interrupts instead of having custom RISC-V
specific hooks. This also makes Linux RISC-V IPI support aligned with
other architectures.
2) Remote TLB flushes and icache flushes should prefer local IPIs instead
of SBI calls whenever we have specialized hardware (such as RISC-V AIA
IMSIC and RISC-V SWI) which allows S-mode software to directly inject
IPIs without any assistance from M-mode runtime firmware.

These patches were originally part of the "Linux RISC-V ACLINT Support"
series but this now a separate series so that it can be merged independently
of the "Linux RISC-V ACLINT Support" series.
(Refer, https://lore.kernel.org/lkml/[email protected]/)

These patches are also a preparatory patches for the up-coming:
1) Linux RISC-V AIA support
2) Linux RISC-V SWI support

These patches can also be found in riscv_ipi_imp_v11 branch at:
https://github.com/avpatel/linux.git

Changes since v10:
- Rebased on Linux-6.1-rc5
- Drop the "!(pending & ibit)" check in ipi_mux_send_mask() of PATCH3
- Disable local interrupts in ipi_mux_send_mask() of PATCH3 because we
can be preempted while using a per-CPU temporary variable.

Changes since v9:
- Rebased on Linux-6.1-rc3
- Updated header comment block of ipi-mux.c in PATCH3
- Use a struct for global data of ipi-mux.c in PATCH3
- Add per-CPU temp cpumask for sending IPIs in PATCH3
- Drop the use of fwspec in PATCH3
- Use static key for ipi_mux_pre_handle() and ipi_mux_post_handle()
in PATCH3
- Remove redundant pr_warn_ratelimited() called by ipi_mux_process()
in PATCH3
- Remove CPUHP thingy from ipi_mux_create() in PATCH3

Changes since v8:
- Rebased on Linux-6.0-rc3
- Use dummy percpu data as parameter for request_percpu_irq() in PATCH4.

Changes since v7:
- Rebased on Linux-6.0-rc1
- Use atomic operations to track per-CPU pending and enabled IPIs in PATCH3.
(Note: this is inspired from IPI muxing implemented in
drivers/irqchip/irq-apple-aic.c)
- Made "struct ipi_mux_ops" (added by PATCH3) flexible so that
drivers/irqchip/irq-apple-aic.c can adopt it in future.

Changes since v6:
- Rebased on Linux-5.19-rc7
- Added documentation for struct ipi_mux_ops in PATCH3
- Dropped dummy irq_mask()/unmask() in PATCH3
- Added const for "ipi_mux_chip" in PATCH3
- Removed "type" initialization from ipi_mux_domain_alloc() in PATCH3
- Dropped translate() from "ipi_mux_domain_ops" in PATCH3
- Improved barrier documentation in ipi_mux_process() of PATCH3
- Added percpu check in ipi_mux_create() for parent_virq of PATCH3
- Added nr_ipi parameter in ipi_mux_create() of PATCH3

Changes since v5:
- Rebased on Linux-5.18-rc3
- Used kernel doc style in PATCH3
- Removed redundant loop in ipi_mux_process() of PATCH3
- Removed "RISC-V" prefix form ipi_mux_chip.name of PATCH3
- Removed use of "this patch" in PATCH3 commit description
- Addressed few other nit comments in PATCH3

Changes since v4:
- Rebased on Linux-5.17
- Includes new PATCH3 which adds mechanism to multiplex a single HW IPI

Changes since v3:
- Rebased on Linux-5.17-rc6
- Updated PATCH2 to not export riscv_set_intc_hwnode_fn()
- Simplified riscv_intc_hwnode() in PATCH2

Changes since v2:
- Rebased on Linux-5.17-rc4
- Updated PATCH2 to not create synthetic INTC fwnode and instead provide
a function which allows drivers to directly discover INTC fwnode

Changes since v1:
- Use synthetic fwnode for INTC instead of irq_set_default_host() in PATCH2

Anup Patel (7):
RISC-V: Clear SIP bit only when using SBI IPI operations
irqchip/riscv-intc: Allow drivers to directly discover INTC hwnode
genirq: Add mechanism to multiplex a single HW IPI
RISC-V: Treat IPIs as normal Linux IRQs
RISC-V: Allow marking IPIs as suitable for remote FENCEs
RISC-V: Use IPIs for remote TLB flush when possible
RISC-V: Use IPIs for remote icache flush when possible

arch/riscv/Kconfig | 2 +
arch/riscv/include/asm/irq.h | 4 +
arch/riscv/include/asm/sbi.h | 7 +
arch/riscv/include/asm/smp.h | 49 ++++--
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/cpu-hotplug.c | 3 +-
arch/riscv/kernel/irq.c | 21 ++-
arch/riscv/kernel/sbi-ipi.c | 80 +++++++++
arch/riscv/kernel/sbi.c | 11 --
arch/riscv/kernel/smp.c | 166 +++++++++---------
arch/riscv/kernel/smpboot.c | 5 +-
arch/riscv/mm/cacheflush.c | 5 +-
arch/riscv/mm/tlbflush.c | 93 +++++++++--
drivers/clocksource/timer-clint.c | 43 +++--
drivers/irqchip/irq-riscv-intc.c | 60 +++----
include/linux/irq.h | 18 ++
kernel/irq/Kconfig | 5 +
kernel/irq/Makefile | 1 +
kernel/irq/ipi-mux.c | 268 ++++++++++++++++++++++++++++++
19 files changed, 675 insertions(+), 167 deletions(-)
create mode 100644 arch/riscv/kernel/sbi-ipi.c
create mode 100644 kernel/irq/ipi-mux.c

--
2.34.1



2022-11-14 10:09:07

by Anup Patel

[permalink] [raw]
Subject: [PATCH v11 3/7] genirq: Add mechanism to multiplex a single HW IPI

All RISC-V platforms have a single HW IPI provided by the INTC local
interrupt controller. The HW method to trigger INTC IPI can be through
external irqchip (e.g. RISC-V AIA), through platform specific device
(e.g. SiFive CLINT timer), or through firmware (e.g. SBI IPI call).

To support multiple IPIs on RISC-V, we add a generic IPI multiplexing
mechanism which help us create multiple virtual IPIs using a single
HW IPI. This generic IPI multiplexing is inspired from the Apple AIC
irqchip driver and it is shared by various RISC-V irqchip drivers.

Signed-off-by: Anup Patel <[email protected]>
---
include/linux/irq.h | 18 +++
kernel/irq/Kconfig | 5 +
kernel/irq/Makefile | 1 +
kernel/irq/ipi-mux.c | 268 +++++++++++++++++++++++++++++++++++++++++++
4 files changed, 292 insertions(+)
create mode 100644 kernel/irq/ipi-mux.c

diff --git a/include/linux/irq.h b/include/linux/irq.h
index c3eb89606c2b..5ab702cb0a5b 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -1266,6 +1266,24 @@ int __ipi_send_mask(struct irq_desc *desc, const struct cpumask *dest);
int ipi_send_single(unsigned int virq, unsigned int cpu);
int ipi_send_mask(unsigned int virq, const struct cpumask *dest);

+/**
+ * struct ipi_mux_ops - IPI multiplex operations
+ *
+ * @ipi_mux_pre_handle: Optional function called before handling parent IPI
+ * @ipi_mux_post_handle:Optional function called after handling parent IPI
+ * @ipi_mux_send: Trigger parent IPI on target CPUs
+ */
+struct ipi_mux_ops {
+ void (*ipi_mux_pre_handle)(unsigned int parent_virq, void *data);
+ void (*ipi_mux_post_handle)(unsigned int parent_virq, void *data);
+ void (*ipi_mux_send)(unsigned int parent_virq, void *data,
+ const struct cpumask *mask);
+};
+
+void ipi_mux_process(void);
+int ipi_mux_create(unsigned int parent_virq, unsigned int nr_ipi,
+ const struct ipi_mux_ops *ops, void *data);
+
#ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER
/*
* Registers a generic IRQ handling function as the top-level IRQ handler in
diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index db3d174c53d4..df17dbc54b02 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -86,6 +86,11 @@ config GENERIC_IRQ_IPI
depends on SMP
select IRQ_DOMAIN_HIERARCHY

+# Generic IRQ IPI Mux support
+config GENERIC_IRQ_IPI_MUX
+ bool
+ depends on SMP
+
# Generic MSI interrupt support
config GENERIC_MSI_IRQ
bool
diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
index b4f53717d143..f19d3080bf11 100644
--- a/kernel/irq/Makefile
+++ b/kernel/irq/Makefile
@@ -15,6 +15,7 @@ obj-$(CONFIG_GENERIC_IRQ_MIGRATION) += cpuhotplug.o
obj-$(CONFIG_PM_SLEEP) += pm.o
obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o
obj-$(CONFIG_GENERIC_IRQ_IPI) += ipi.o
+obj-$(CONFIG_GENERIC_IRQ_IPI_MUX) += ipi-mux.o
obj-$(CONFIG_SMP) += affinity.o
obj-$(CONFIG_GENERIC_IRQ_DEBUGFS) += debugfs.o
obj-$(CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR) += matrix.o
diff --git a/kernel/irq/ipi-mux.c b/kernel/irq/ipi-mux.c
new file mode 100644
index 000000000000..259e00366dd7
--- /dev/null
+++ b/kernel/irq/ipi-mux.c
@@ -0,0 +1,268 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Multiplex several virtual IPIs over a single HW IPI.
+ *
+ * Copyright The Asahi Linux Contributors
+ * Copyright (c) 2022 Ventana Micro Systems Inc.
+ */
+
+#define pr_fmt(fmt) "ipi-mux: " fmt
+#include <linux/cpu.h>
+#include <linux/init.h>
+#include <linux/irq.h>
+#include <linux/irqchip.h>
+#include <linux/irqchip/chained_irq.h>
+#include <linux/irqdomain.h>
+#include <linux/jump_label.h>
+#include <linux/percpu.h>
+#include <linux/smp.h>
+
+struct ipi_mux_cpu {
+ atomic_t enable;
+ atomic_t bits;
+ struct cpumask send_mask;
+};
+
+struct ipi_mux_control {
+ void *data;
+ unsigned int nr;
+ unsigned int parent_virq;
+ struct irq_domain *domain;
+ const struct ipi_mux_ops *ops;
+ struct ipi_mux_cpu __percpu *cpu;
+};
+
+static struct ipi_mux_control *imux;
+static DEFINE_STATIC_KEY_FALSE(imux_pre_handle);
+static DEFINE_STATIC_KEY_FALSE(imux_post_handle);
+
+static void ipi_mux_mask(struct irq_data *d)
+{
+ struct ipi_mux_cpu *icpu = this_cpu_ptr(imux->cpu);
+
+ atomic_andnot(BIT(irqd_to_hwirq(d)), &icpu->enable);
+}
+
+static void ipi_mux_unmask(struct irq_data *d)
+{
+ u32 ibit = BIT(irqd_to_hwirq(d));
+ struct ipi_mux_cpu *icpu = this_cpu_ptr(imux->cpu);
+
+ atomic_or(ibit, &icpu->enable);
+
+ /*
+ * The atomic_or() above must complete before the atomic_read()
+ * below to avoid racing ipi_mux_send_mask().
+ */
+ smp_mb__after_atomic();
+
+ /* If a pending IPI was unmasked, raise a parent IPI immediately. */
+ if (atomic_read(&icpu->bits) & ibit)
+ imux->ops->ipi_mux_send(imux->parent_virq, imux->data,
+ cpumask_of(smp_processor_id()));
+}
+
+static void ipi_mux_send_mask(struct irq_data *d, const struct cpumask *mask)
+{
+ u32 ibit = BIT(irqd_to_hwirq(d));
+ struct ipi_mux_cpu *icpu = this_cpu_ptr(imux->cpu);
+ struct cpumask *send_mask = &icpu->send_mask;
+ unsigned long flags;
+ int cpu;
+
+ /*
+ * We use send_mask as a per-CPU variable so disable local
+ * interrupts to avoid being preempted.
+ */
+ local_irq_save(flags);
+
+ cpumask_clear(send_mask);
+
+ for_each_cpu(cpu, mask) {
+ icpu = per_cpu_ptr(imux->cpu, cpu);
+ atomic_or(ibit, &icpu->bits);
+
+ /*
+ * The atomic_or() above must complete before
+ * the atomic_read() below to avoid racing with
+ * ipi_mux_unmask().
+ */
+ smp_mb__after_atomic();
+
+ if (atomic_read(&icpu->enable) & ibit)
+ cpumask_set_cpu(cpu, send_mask);
+ }
+
+ /* Trigger the parent IPI */
+ imux->ops->ipi_mux_send(imux->parent_virq, imux->data, send_mask);
+
+ local_irq_restore(flags);
+}
+
+static const struct irq_chip ipi_mux_chip = {
+ .name = "IPI Mux",
+ .irq_mask = ipi_mux_mask,
+ .irq_unmask = ipi_mux_unmask,
+ .ipi_send_mask = ipi_mux_send_mask,
+};
+
+static int ipi_mux_domain_alloc(struct irq_domain *d, unsigned int virq,
+ unsigned int nr_irqs, void *arg)
+{
+ int i;
+
+ for (i = 0; i < nr_irqs; i++) {
+ irq_set_percpu_devid(virq + i);
+ irq_domain_set_info(d, virq + i, i,
+ &ipi_mux_chip, d->host_data,
+ handle_percpu_devid_irq, NULL, NULL);
+ }
+
+ return 0;
+}
+
+static const struct irq_domain_ops ipi_mux_domain_ops = {
+ .alloc = ipi_mux_domain_alloc,
+ .free = irq_domain_free_irqs_top,
+};
+
+/**
+ * ipi_mux_process - Process multiplexed virtual IPIs
+ */
+void ipi_mux_process(void)
+{
+ struct ipi_mux_cpu *icpu = this_cpu_ptr(imux->cpu);
+ irq_hw_number_t hwirq;
+ unsigned long ipis;
+ int en;
+
+ if (static_branch_unlikely(&imux_pre_handle))
+ imux->ops->ipi_mux_pre_handle(imux->parent_virq, imux->data);
+
+ /*
+ * Reading enable mask does not need to be ordered as long as
+ * this function called from interrupt handler because only
+ * the CPU itself can change it's own enable mask.
+ */
+ en = atomic_read(&icpu->enable);
+
+ /*
+ * Clear the IPIs we are about to handle. This pairs with the
+ * atomic_fetch_or_release() in ipi_mux_send_mask().
+ */
+ ipis = atomic_fetch_andnot(en, &icpu->bits) & en;
+
+ for_each_set_bit(hwirq, &ipis, imux->nr)
+ generic_handle_domain_irq(imux->domain, hwirq);
+
+ if (static_branch_unlikely(&imux_post_handle))
+ imux->ops->ipi_mux_post_handle(imux->parent_virq, imux->data);
+}
+
+static void ipi_mux_handler(struct irq_desc *desc)
+{
+ struct irq_chip *chip = irq_desc_get_chip(desc);
+
+ chained_irq_enter(chip, desc);
+ ipi_mux_process();
+ chained_irq_exit(chip, desc);
+}
+
+/**
+ * ipi_mux_create - Create virtual IPIs multiplexed on top of a single
+ * parent IPI.
+ * @parent_virq: virq of the parent per-CPU IRQ
+ * @nr_ipi: number of virtual IPIs to create. This should
+ * be <= BITS_PER_TYPE(int)
+ * @ops: multiplexing operations for the parent IPI
+ * @data: opaque data used by the multiplexing operations
+ *
+ * If the parent IPI > 0 then ipi_mux_process() will be automatically
+ * called via chained handler.
+ *
+ * If the parent IPI <= 0 then it is responsibility of irqchip drivers
+ * to explicitly call ipi_mux_process() for processing muxed IPIs.
+ *
+ * Returns first virq of the newly created virtual IPIs upon success
+ * or <=0 upon failure
+ */
+int ipi_mux_create(unsigned int parent_virq, unsigned int nr_ipi,
+ const struct ipi_mux_ops *ops, void *data)
+{
+ struct fwnode_handle *fwnode;
+ struct irq_domain *domain;
+ int rc;
+
+ if (imux)
+ return -EEXIST;
+
+ if (BITS_PER_TYPE(int) < nr_ipi || !ops || !ops->ipi_mux_send)
+ return -EINVAL;
+
+ if (parent_virq &&
+ !irqd_is_per_cpu(irq_desc_get_irq_data(irq_to_desc(parent_virq))))
+ return -EINVAL;
+
+ imux = kzalloc(sizeof(*imux), GFP_KERNEL);
+ if (!imux)
+ return -ENOMEM;
+
+ imux->cpu = alloc_percpu(typeof(*imux->cpu));
+ if (!imux->cpu) {
+ rc = -ENOMEM;
+ goto fail_free_mux;
+ }
+
+ fwnode = irq_domain_alloc_named_fwnode("IPI-Mux");
+ if (!fwnode) {
+ pr_err("unable to create IPI Mux fwnode\n");
+ rc = -ENOMEM;
+ goto fail_free_cpu;
+ }
+
+ domain = irq_domain_create_simple(fwnode, nr_ipi, 0,
+ &ipi_mux_domain_ops, NULL);
+ if (!domain) {
+ pr_err("unable to add IPI Mux domain\n");
+ rc = -ENOMEM;
+ goto fail_free_fwnode;
+ }
+
+ domain->flags |= IRQ_DOMAIN_FLAG_IPI_SINGLE;
+ irq_domain_update_bus_token(domain, DOMAIN_BUS_IPI);
+
+ rc = __irq_domain_alloc_irqs(domain, -1, nr_ipi,
+ NUMA_NO_NODE, NULL, false, NULL);
+ if (rc <= 0) {
+ pr_err("unable to alloc IRQs from IPI Mux domain\n");
+ goto fail_free_domain;
+ }
+
+ imux->domain = domain;
+ imux->data = data;
+ imux->nr = nr_ipi;
+ imux->parent_virq = parent_virq;
+ imux->ops = ops;
+
+ if (imux->ops->ipi_mux_pre_handle)
+ static_branch_enable(&imux_pre_handle);
+
+ if (imux->ops->ipi_mux_post_handle)
+ static_branch_enable(&imux_post_handle);
+
+ if (parent_virq > 0)
+ irq_set_chained_handler(parent_virq, ipi_mux_handler);
+
+ return rc;
+
+fail_free_domain:
+ irq_domain_remove(domain);
+fail_free_fwnode:
+ irq_domain_free_fwnode(fwnode);
+fail_free_cpu:
+ free_percpu(imux->cpu);
+fail_free_mux:
+ kfree(imux);
+ imux = NULL;
+ return rc;
+}
--
2.34.1


2022-11-14 10:11:12

by Anup Patel

[permalink] [raw]
Subject: [PATCH v11 7/7] RISC-V: Use IPIs for remote icache flush when possible

If we have specialized interrupt controller (such as AIA IMSIC) which
allows supervisor mode to directly inject IPIs without any assistance
from M-mode or HS-mode then using such specialized interrupt controller,
we can do remote icache flushe directly from supervisor mode instead of
using the SBI RFENCE calls.

This patch extends remote icache flush functions to use supervisor mode
IPIs whenever direct supervisor mode IPIs.are supported by interrupt
controller.

Signed-off-by: Anup Patel <[email protected]>
Reviewed-by: Atish Patra <[email protected]>
---
arch/riscv/mm/cacheflush.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index 57b40a350420..f10cb47eac3a 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -19,7 +19,7 @@ void flush_icache_all(void)
{
local_flush_icache_all();

- if (IS_ENABLED(CONFIG_RISCV_SBI))
+ if (IS_ENABLED(CONFIG_RISCV_SBI) && !riscv_use_ipi_for_rfence())
sbi_remote_fence_i(NULL);
else
on_each_cpu(ipi_remote_fence_i, NULL, 1);
@@ -67,7 +67,8 @@ void flush_icache_mm(struct mm_struct *mm, bool local)
* with flush_icache_deferred().
*/
smp_mb();
- } else if (IS_ENABLED(CONFIG_RISCV_SBI)) {
+ } else if (IS_ENABLED(CONFIG_RISCV_SBI) &&
+ !riscv_use_ipi_for_rfence()) {
sbi_remote_fence_i(&others);
} else {
on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1);
--
2.34.1


2022-11-14 10:37:42

by Anup Patel

[permalink] [raw]
Subject: [PATCH v11 6/7] RISC-V: Use IPIs for remote TLB flush when possible

If we have specialized interrupt controller (such as AIA IMSIC) which
allows supervisor mode to directly inject IPIs without any assistance
from M-mode or HS-mode then using such specialized interrupt controller,
we can do remote TLB flushes directly from supervisor mode instead of
using the SBI RFENCE calls.

This patch extends remote TLB flush functions to use supervisor mode
IPIs whenever direct supervisor mode IPIs.are supported by interrupt
controller.

Signed-off-by: Anup Patel <[email protected]>
Reviewed-by: Atish Patra <[email protected]>
---
arch/riscv/mm/tlbflush.c | 93 +++++++++++++++++++++++++++++++++-------
1 file changed, 78 insertions(+), 15 deletions(-)

diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 37ed760d007c..27a7db8eb2c4 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -23,14 +23,62 @@ static inline void local_flush_tlb_page_asid(unsigned long addr,
: "memory");
}

+static inline void local_flush_tlb_range(unsigned long start,
+ unsigned long size, unsigned long stride)
+{
+ if (size <= stride)
+ local_flush_tlb_page(start);
+ else
+ local_flush_tlb_all();
+}
+
+static inline void local_flush_tlb_range_asid(unsigned long start,
+ unsigned long size, unsigned long stride, unsigned long asid)
+{
+ if (size <= stride)
+ local_flush_tlb_page_asid(start, asid);
+ else
+ local_flush_tlb_all_asid(asid);
+}
+
+static void __ipi_flush_tlb_all(void *info)
+{
+ local_flush_tlb_all();
+}
+
void flush_tlb_all(void)
{
- sbi_remote_sfence_vma(NULL, 0, -1);
+ if (riscv_use_ipi_for_rfence())
+ on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
+ else
+ sbi_remote_sfence_vma(NULL, 0, -1);
+}
+
+struct flush_tlb_range_data {
+ unsigned long asid;
+ unsigned long start;
+ unsigned long size;
+ unsigned long stride;
+};
+
+static void __ipi_flush_tlb_range_asid(void *info)
+{
+ struct flush_tlb_range_data *d = info;
+
+ local_flush_tlb_range_asid(d->start, d->size, d->stride, d->asid);
+}
+
+static void __ipi_flush_tlb_range(void *info)
+{
+ struct flush_tlb_range_data *d = info;
+
+ local_flush_tlb_range(d->start, d->size, d->stride);
}

-static void __sbi_tlb_flush_range(struct mm_struct *mm, unsigned long start,
- unsigned long size, unsigned long stride)
+static void __flush_tlb_range(struct mm_struct *mm, unsigned long start,
+ unsigned long size, unsigned long stride)
{
+ struct flush_tlb_range_data ftd;
struct cpumask *cmask = mm_cpumask(mm);
unsigned int cpuid;
bool broadcast;
@@ -45,19 +93,34 @@ static void __sbi_tlb_flush_range(struct mm_struct *mm, unsigned long start,
unsigned long asid = atomic_long_read(&mm->context.id);

if (broadcast) {
- sbi_remote_sfence_vma_asid(cmask, start, size, asid);
- } else if (size <= stride) {
- local_flush_tlb_page_asid(start, asid);
+ if (riscv_use_ipi_for_rfence()) {
+ ftd.asid = asid;
+ ftd.start = start;
+ ftd.size = size;
+ ftd.stride = stride;
+ on_each_cpu_mask(cmask,
+ __ipi_flush_tlb_range_asid,
+ &ftd, 1);
+ } else
+ sbi_remote_sfence_vma_asid(cmask,
+ start, size, asid);
} else {
- local_flush_tlb_all_asid(asid);
+ local_flush_tlb_range_asid(start, size, stride, asid);
}
} else {
if (broadcast) {
- sbi_remote_sfence_vma(cmask, start, size);
- } else if (size <= stride) {
- local_flush_tlb_page(start);
+ if (riscv_use_ipi_for_rfence()) {
+ ftd.asid = 0;
+ ftd.start = start;
+ ftd.size = size;
+ ftd.stride = stride;
+ on_each_cpu_mask(cmask,
+ __ipi_flush_tlb_range,
+ &ftd, 1);
+ } else
+ sbi_remote_sfence_vma(cmask, start, size);
} else {
- local_flush_tlb_all();
+ local_flush_tlb_range(start, size, stride);
}
}

@@ -66,23 +129,23 @@ static void __sbi_tlb_flush_range(struct mm_struct *mm, unsigned long start,

void flush_tlb_mm(struct mm_struct *mm)
{
- __sbi_tlb_flush_range(mm, 0, -1, PAGE_SIZE);
+ __flush_tlb_range(mm, 0, -1, PAGE_SIZE);
}

void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
{
- __sbi_tlb_flush_range(vma->vm_mm, addr, PAGE_SIZE, PAGE_SIZE);
+ __flush_tlb_range(vma->vm_mm, addr, PAGE_SIZE, PAGE_SIZE);
}

void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
unsigned long end)
{
- __sbi_tlb_flush_range(vma->vm_mm, start, end - start, PAGE_SIZE);
+ __flush_tlb_range(vma->vm_mm, start, end - start, PAGE_SIZE);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
unsigned long end)
{
- __sbi_tlb_flush_range(vma->vm_mm, start, end - start, PMD_SIZE);
+ __flush_tlb_range(vma->vm_mm, start, end - start, PMD_SIZE);
}
#endif
--
2.34.1


2022-11-14 10:47:44

by Anup Patel

[permalink] [raw]
Subject: [PATCH v11 5/7] RISC-V: Allow marking IPIs as suitable for remote FENCEs

To do remote FENCEs (i.e. remote TLB flushes) using IPI calls on the
RISC-V kernel, we need hardware mechanism to directly inject IPI from
the supervisor mode (i.e. RISC-V kernel) instead of using SBI calls.

The upcoming AIA IMSIC devices allow direct IPI injection from the
supervisor mode (i.e. RISC-V kernel). To support this, we extend the
riscv_ipi_set_virq_range() function so that IPI provider (i.e. irqchip
drivers can mark IPIs as suitable for remote FENCEs.

Signed-off-by: Anup Patel <[email protected]>
Reviewed-by: Atish Patra <[email protected]>
---
arch/riscv/include/asm/smp.h | 18 ++++++++++++++++--
arch/riscv/kernel/sbi-ipi.c | 2 +-
arch/riscv/kernel/smp.c | 11 ++++++++++-
drivers/clocksource/timer-clint.c | 2 +-
4 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h
index 79ed0b73cd4e..56976e41a21e 100644
--- a/arch/riscv/include/asm/smp.h
+++ b/arch/riscv/include/asm/smp.h
@@ -16,6 +16,9 @@ struct seq_file;
extern unsigned long boot_cpu_hartid;

#ifdef CONFIG_SMP
+
+#include <linux/jump_label.h>
+
/*
* Mapping between linux logical cpu index and hartid.
*/
@@ -46,7 +49,12 @@ void riscv_ipi_disable(void);
bool riscv_ipi_have_virq_range(void);

/* Set the IPI interrupt numbers for arch (called by irqchip drivers) */
-void riscv_ipi_set_virq_range(int virq, int nr);
+void riscv_ipi_set_virq_range(int virq, int nr, bool use_for_rfence);
+
+/* Check if we can use IPIs for remote FENCEs */
+DECLARE_STATIC_KEY_FALSE(riscv_ipi_for_rfence);
+#define riscv_use_ipi_for_rfence() \
+ static_branch_unlikely(&riscv_ipi_for_rfence)

/* Secondary hart entry */
asmlinkage void smp_callin(void);
@@ -93,10 +101,16 @@ static inline bool riscv_ipi_have_virq_range(void)
return false;
}

-static inline void riscv_ipi_set_virq_range(int virq, int nr)
+static inline void riscv_ipi_set_virq_range(int virq, int nr,
+ bool use_for_rfence)
{
}

+static inline bool riscv_use_ipi_for_rfence(void)
+{
+ return false;
+}
+
#endif /* CONFIG_SMP */

#if defined(CONFIG_HOTPLUG_CPU) && (CONFIG_SMP)
diff --git a/arch/riscv/kernel/sbi-ipi.c b/arch/riscv/kernel/sbi-ipi.c
index f0a78420b127..ee8620104bd8 100644
--- a/arch/riscv/kernel/sbi-ipi.c
+++ b/arch/riscv/kernel/sbi-ipi.c
@@ -75,6 +75,6 @@ void __init sbi_ipi_init(void)
"irqchip/sbi-ipi:starting",
sbi_ipi_starting_cpu, sbi_ipi_dying_cpu);

- riscv_ipi_set_virq_range(virq, BITS_PER_BYTE);
+ riscv_ipi_set_virq_range(virq, BITS_PER_BYTE, false);
pr_info("providing IPIs using SBI IPI extension\n");
}
diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index e8a20454d65b..74b8cb1a89ab 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -145,7 +145,10 @@ bool riscv_ipi_have_virq_range(void)
return (ipi_virq_base) ? true : false;
}

-void riscv_ipi_set_virq_range(int virq, int nr)
+DEFINE_STATIC_KEY_FALSE(riscv_ipi_for_rfence);
+EXPORT_SYMBOL_GPL(riscv_ipi_for_rfence);
+
+void riscv_ipi_set_virq_range(int virq, int nr, bool use_for_rfence)
{
int i, err;

@@ -168,6 +171,12 @@ void riscv_ipi_set_virq_range(int virq, int nr)

/* Enabled IPIs for boot CPU immediately */
riscv_ipi_enable();
+
+ /* Update RFENCE static key */
+ if (use_for_rfence)
+ static_branch_enable(&riscv_ipi_for_rfence);
+ else
+ static_branch_disable(&riscv_ipi_for_rfence);
}
EXPORT_SYMBOL_GPL(riscv_ipi_set_virq_range);

diff --git a/drivers/clocksource/timer-clint.c b/drivers/clocksource/timer-clint.c
index f9dd746a72c5..658049a5440b 100644
--- a/drivers/clocksource/timer-clint.c
+++ b/drivers/clocksource/timer-clint.c
@@ -249,7 +249,7 @@ static int __init clint_timer_init_dt(struct device_node *np)
goto fail_free_irq;
}

- riscv_ipi_set_virq_range(virq, BITS_PER_BYTE);
+ riscv_ipi_set_virq_range(virq, BITS_PER_BYTE, true);
clint_clear_ipi(clint_ipi_irq, NULL);

return 0;
--
2.34.1


2022-11-26 13:12:31

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v11 3/7] genirq: Add mechanism to multiplex a single HW IPI

On Mon, 14 Nov 2022 09:39:00 +0000,
Anup Patel <[email protected]> wrote:
>
> All RISC-V platforms have a single HW IPI provided by the INTC local
> interrupt controller. The HW method to trigger INTC IPI can be through
> external irqchip (e.g. RISC-V AIA), through platform specific device
> (e.g. SiFive CLINT timer), or through firmware (e.g. SBI IPI call).
>
> To support multiple IPIs on RISC-V, we add a generic IPI multiplexing
> mechanism which help us create multiple virtual IPIs using a single
> HW IPI. This generic IPI multiplexing is inspired from the Apple AIC
> irqchip driver and it is shared by various RISC-V irqchip drivers.
>
> Signed-off-by: Anup Patel <[email protected]>
> ---
> include/linux/irq.h | 18 +++
> kernel/irq/Kconfig | 5 +
> kernel/irq/Makefile | 1 +
> kernel/irq/ipi-mux.c | 268 +++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 292 insertions(+)
> create mode 100644 kernel/irq/ipi-mux.c
>
> diff --git a/include/linux/irq.h b/include/linux/irq.h
> index c3eb89606c2b..5ab702cb0a5b 100644
> --- a/include/linux/irq.h
> +++ b/include/linux/irq.h
> @@ -1266,6 +1266,24 @@ int __ipi_send_mask(struct irq_desc *desc, const struct cpumask *dest);
> int ipi_send_single(unsigned int virq, unsigned int cpu);
> int ipi_send_mask(unsigned int virq, const struct cpumask *dest);
>
> +/**
> + * struct ipi_mux_ops - IPI multiplex operations
> + *
> + * @ipi_mux_pre_handle: Optional function called before handling parent IPI
> + * @ipi_mux_post_handle:Optional function called after handling parent IPI
> + * @ipi_mux_send: Trigger parent IPI on target CPUs
> + */
> +struct ipi_mux_ops {
> + void (*ipi_mux_pre_handle)(unsigned int parent_virq, void *data);
> + void (*ipi_mux_post_handle)(unsigned int parent_virq, void *data);

I still haven't seen any decent explanation for this other than "we
need it". What is it that cannot be achieved via the irq_ack() and
irq_eoi() callbacks?

> + void (*ipi_mux_send)(unsigned int parent_virq, void *data,
> + const struct cpumask *mask);

In what context would the 'parent_virq' be useful? You are *sending*
an IPI, not receiving it, and I expect the mechanism by which you send
such an IPI to be independent of the Linux view of the irq.

Also, please swap data and mask in the function signature.

> +};
> +
> +void ipi_mux_process(void);
> +int ipi_mux_create(unsigned int parent_virq, unsigned int nr_ipi,
> + const struct ipi_mux_ops *ops, void *data);
> +
> #ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER
> /*
> * Registers a generic IRQ handling function as the top-level IRQ handler in
> diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
> index db3d174c53d4..df17dbc54b02 100644
> --- a/kernel/irq/Kconfig
> +++ b/kernel/irq/Kconfig
> @@ -86,6 +86,11 @@ config GENERIC_IRQ_IPI
> depends on SMP
> select IRQ_DOMAIN_HIERARCHY
>
> +# Generic IRQ IPI Mux support
> +config GENERIC_IRQ_IPI_MUX
> + bool
> + depends on SMP
> +
> # Generic MSI interrupt support
> config GENERIC_MSI_IRQ
> bool
> diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
> index b4f53717d143..f19d3080bf11 100644
> --- a/kernel/irq/Makefile
> +++ b/kernel/irq/Makefile
> @@ -15,6 +15,7 @@ obj-$(CONFIG_GENERIC_IRQ_MIGRATION) += cpuhotplug.o
> obj-$(CONFIG_PM_SLEEP) += pm.o
> obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o
> obj-$(CONFIG_GENERIC_IRQ_IPI) += ipi.o
> +obj-$(CONFIG_GENERIC_IRQ_IPI_MUX) += ipi-mux.o
> obj-$(CONFIG_SMP) += affinity.o
> obj-$(CONFIG_GENERIC_IRQ_DEBUGFS) += debugfs.o
> obj-$(CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR) += matrix.o
> diff --git a/kernel/irq/ipi-mux.c b/kernel/irq/ipi-mux.c
> new file mode 100644
> index 000000000000..259e00366dd7
> --- /dev/null
> +++ b/kernel/irq/ipi-mux.c
> @@ -0,0 +1,268 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Multiplex several virtual IPIs over a single HW IPI.
> + *
> + * Copyright The Asahi Linux Contributors
> + * Copyright (c) 2022 Ventana Micro Systems Inc.
> + */
> +
> +#define pr_fmt(fmt) "ipi-mux: " fmt
> +#include <linux/cpu.h>
> +#include <linux/init.h>
> +#include <linux/irq.h>
> +#include <linux/irqchip.h>
> +#include <linux/irqchip/chained_irq.h>
> +#include <linux/irqdomain.h>
> +#include <linux/jump_label.h>
> +#include <linux/percpu.h>
> +#include <linux/smp.h>
> +
> +struct ipi_mux_cpu {
> + atomic_t enable;
> + atomic_t bits;
> + struct cpumask send_mask;
> +};
> +
> +struct ipi_mux_control {
> + void *data;
> + unsigned int nr;

Honestly, I think we can get rid of this. The number of IPIs Linux
uses is pretty small, and assuming a huge value (like 32) would be
enough. It would save looking up this value on each IPI handling.

> + unsigned int parent_virq;
> + struct irq_domain *domain;
> + const struct ipi_mux_ops *ops;
> + struct ipi_mux_cpu __percpu *cpu;
> +};
> +
> +static struct ipi_mux_control *imux;
> +static DEFINE_STATIC_KEY_FALSE(imux_pre_handle);
> +static DEFINE_STATIC_KEY_FALSE(imux_post_handle);
> +
> +static void ipi_mux_mask(struct irq_data *d)
> +{
> + struct ipi_mux_cpu *icpu = this_cpu_ptr(imux->cpu);
> +
> + atomic_andnot(BIT(irqd_to_hwirq(d)), &icpu->enable);
> +}
> +
> +static void ipi_mux_unmask(struct irq_data *d)
> +{
> + u32 ibit = BIT(irqd_to_hwirq(d));
> + struct ipi_mux_cpu *icpu = this_cpu_ptr(imux->cpu);
> +
> + atomic_or(ibit, &icpu->enable);
> +
> + /*
> + * The atomic_or() above must complete before the atomic_read()
> + * below to avoid racing ipi_mux_send_mask().
> + */
> + smp_mb__after_atomic();
> +
> + /* If a pending IPI was unmasked, raise a parent IPI immediately. */
> + if (atomic_read(&icpu->bits) & ibit)
> + imux->ops->ipi_mux_send(imux->parent_virq, imux->data,
> + cpumask_of(smp_processor_id()));
> +}
> +
> +static void ipi_mux_send_mask(struct irq_data *d, const struct cpumask *mask)
> +{
> + u32 ibit = BIT(irqd_to_hwirq(d));
> + struct ipi_mux_cpu *icpu = this_cpu_ptr(imux->cpu);
> + struct cpumask *send_mask = &icpu->send_mask;
> + unsigned long flags;
> + int cpu;
> +
> + /*
> + * We use send_mask as a per-CPU variable so disable local
> + * interrupts to avoid being preempted.
> + */
> + local_irq_save(flags);
> +
> + cpumask_clear(send_mask);
> +
> + for_each_cpu(cpu, mask) {
> + icpu = per_cpu_ptr(imux->cpu, cpu);
> + atomic_or(ibit, &icpu->bits);
> +
> + /*
> + * The atomic_or() above must complete before
> + * the atomic_read() below to avoid racing with
> + * ipi_mux_unmask().
> + */
> + smp_mb__after_atomic();
> +
> + if (atomic_read(&icpu->enable) & ibit)
> + cpumask_set_cpu(cpu, send_mask);
> + }
> +
> + /* Trigger the parent IPI */
> + imux->ops->ipi_mux_send(imux->parent_virq, imux->data, send_mask);
> +
> + local_irq_restore(flags);
> +}
> +
> +static const struct irq_chip ipi_mux_chip = {
> + .name = "IPI Mux",
> + .irq_mask = ipi_mux_mask,
> + .irq_unmask = ipi_mux_unmask,
> + .ipi_send_mask = ipi_mux_send_mask,
> +};

I really think this could either be supplied by the irqchip, or
somehow patched to avoid the pointless imux->ops->ipi_mux_send
indirection. Pointer chasing hurts.

> +
> +static int ipi_mux_domain_alloc(struct irq_domain *d, unsigned int virq,
> + unsigned int nr_irqs, void *arg)
> +{
> + int i;
> +
> + for (i = 0; i < nr_irqs; i++) {
> + irq_set_percpu_devid(virq + i);
> + irq_domain_set_info(d, virq + i, i,
> + &ipi_mux_chip, d->host_data,
> + handle_percpu_devid_irq, NULL, NULL);
> + }
> +
> + return 0;
> +}
> +
> +static const struct irq_domain_ops ipi_mux_domain_ops = {
> + .alloc = ipi_mux_domain_alloc,
> + .free = irq_domain_free_irqs_top,
> +};
> +
> +/**
> + * ipi_mux_process - Process multiplexed virtual IPIs
> + */
> +void ipi_mux_process(void)
> +{
> + struct ipi_mux_cpu *icpu = this_cpu_ptr(imux->cpu);
> + irq_hw_number_t hwirq;
> + unsigned long ipis;
> + int en;

Please use an unsigned type here (see the rationale in atomic_t.txt).

> +
> + if (static_branch_unlikely(&imux_pre_handle))
> + imux->ops->ipi_mux_pre_handle(imux->parent_virq, imux->data);
> +
> + /*
> + * Reading enable mask does not need to be ordered as long as
> + * this function called from interrupt handler because only
> + * the CPU itself can change it's own enable mask.
> + */
> + en = atomic_read(&icpu->enable);
> +
> + /*
> + * Clear the IPIs we are about to handle. This pairs with the
> + * atomic_fetch_or_release() in ipi_mux_send_mask().
> + */
> + ipis = atomic_fetch_andnot(en, &icpu->bits) & en;
> +
> + for_each_set_bit(hwirq, &ipis, imux->nr)
> + generic_handle_domain_irq(imux->domain, hwirq);
> +
> + if (static_branch_unlikely(&imux_post_handle))
> + imux->ops->ipi_mux_post_handle(imux->parent_virq, imux->data);

Do you see what I meant about the {pre,post}_handle callback, and how
they are *exactly* like irq_{ack,eoi}?

> +}
> +
> +static void ipi_mux_handler(struct irq_desc *desc)
> +{
> + struct irq_chip *chip = irq_desc_get_chip(desc);
> +
> + chained_irq_enter(chip, desc);
> + ipi_mux_process();
> + chained_irq_exit(chip, desc);
> +}
> +
> +/**
> + * ipi_mux_create - Create virtual IPIs multiplexed on top of a single
> + * parent IPI.
> + * @parent_virq: virq of the parent per-CPU IRQ
> + * @nr_ipi: number of virtual IPIs to create. This should
> + * be <= BITS_PER_TYPE(int)
> + * @ops: multiplexing operations for the parent IPI
> + * @data: opaque data used by the multiplexing operations

What is the use for data? If anything, that data should be passed via
the mux interrupt. But the whole point of this is to make the mux
invisible. So this whole 'data' business is a mystery to me.

> + *
> + * If the parent IPI > 0 then ipi_mux_process() will be automatically
> + * called via chained handler.
> + *
> + * If the parent IPI <= 0 then it is responsibility of irqchip drivers
> + * to explicitly call ipi_mux_process() for processing muxed IPIs.

0 is a much more idiomatic value for "no parent IRQ". < 0 normally
represents an error.

> + *
> + * Returns first virq of the newly created virtual IPIs upon success
> + * or <=0 upon failure
> + */
> +int ipi_mux_create(unsigned int parent_virq, unsigned int nr_ipi,

Tell me how I can express a negative parent_virq here?

> + const struct ipi_mux_ops *ops, void *data)
> +{
> + struct fwnode_handle *fwnode;
> + struct irq_domain *domain;
> + int rc;
> +
> + if (imux)
> + return -EEXIST;
> +
> + if (BITS_PER_TYPE(int) < nr_ipi || !ops || !ops->ipi_mux_send)
> + return -EINVAL;
> +
> + if (parent_virq &&
> + !irqd_is_per_cpu(irq_desc_get_irq_data(irq_to_desc(parent_virq))))
> + return -EINVAL;

See how buggy this is if I follow your definition of "no parent IRQ"?

M.

--
Without deviation from the norm, progress is not possible.

2022-11-26 13:35:46

by Anup Patel

[permalink] [raw]
Subject: Re: [PATCH v11 3/7] genirq: Add mechanism to multiplex a single HW IPI

On Sat, Nov 26, 2022 at 6:12 PM Marc Zyngier <[email protected]> wrote:
>
> On Mon, 14 Nov 2022 09:39:00 +0000,
> Anup Patel <[email protected]> wrote:
> >
> > All RISC-V platforms have a single HW IPI provided by the INTC local
> > interrupt controller. The HW method to trigger INTC IPI can be through
> > external irqchip (e.g. RISC-V AIA), through platform specific device
> > (e.g. SiFive CLINT timer), or through firmware (e.g. SBI IPI call).
> >
> > To support multiple IPIs on RISC-V, we add a generic IPI multiplexing
> > mechanism which help us create multiple virtual IPIs using a single
> > HW IPI. This generic IPI multiplexing is inspired from the Apple AIC
> > irqchip driver and it is shared by various RISC-V irqchip drivers.
> >
> > Signed-off-by: Anup Patel <[email protected]>
> > ---
> > include/linux/irq.h | 18 +++
> > kernel/irq/Kconfig | 5 +
> > kernel/irq/Makefile | 1 +
> > kernel/irq/ipi-mux.c | 268 +++++++++++++++++++++++++++++++++++++++++++
> > 4 files changed, 292 insertions(+)
> > create mode 100644 kernel/irq/ipi-mux.c
> >
> > diff --git a/include/linux/irq.h b/include/linux/irq.h
> > index c3eb89606c2b..5ab702cb0a5b 100644
> > --- a/include/linux/irq.h
> > +++ b/include/linux/irq.h
> > @@ -1266,6 +1266,24 @@ int __ipi_send_mask(struct irq_desc *desc, const struct cpumask *dest);
> > int ipi_send_single(unsigned int virq, unsigned int cpu);
> > int ipi_send_mask(unsigned int virq, const struct cpumask *dest);
> >
> > +/**
> > + * struct ipi_mux_ops - IPI multiplex operations
> > + *
> > + * @ipi_mux_pre_handle: Optional function called before handling parent IPI
> > + * @ipi_mux_post_handle:Optional function called after handling parent IPI
> > + * @ipi_mux_send: Trigger parent IPI on target CPUs
> > + */
> > +struct ipi_mux_ops {
> > + void (*ipi_mux_pre_handle)(unsigned int parent_virq, void *data);
> > + void (*ipi_mux_post_handle)(unsigned int parent_virq, void *data);
>
> I still haven't seen any decent explanation for this other than "we
> need it". What is it that cannot be achieved via the irq_ack() and
> irq_eoi() callbacks?

Sure, even I think these are strange looking callbacks. I will drop
these callbacks in the next patch revision.

>
> > + void (*ipi_mux_send)(unsigned int parent_virq, void *data,
> > + const struct cpumask *mask);
>
> In what context would the 'parent_virq' be useful? You are *sending*
> an IPI, not receiving it, and I expect the mechanism by which you send
> such an IPI to be independent of the Linux view of the irq.

Okay, I will drop the 'parent_virq' parameter.

>
> Also, please swap data and mask in the function signature.

Okay, I will change the order.

>
> > +};
> > +
> > +void ipi_mux_process(void);
> > +int ipi_mux_create(unsigned int parent_virq, unsigned int nr_ipi,
> > + const struct ipi_mux_ops *ops, void *data);
> > +
> > #ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER
> > /*
> > * Registers a generic IRQ handling function as the top-level IRQ handler in
> > diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
> > index db3d174c53d4..df17dbc54b02 100644
> > --- a/kernel/irq/Kconfig
> > +++ b/kernel/irq/Kconfig
> > @@ -86,6 +86,11 @@ config GENERIC_IRQ_IPI
> > depends on SMP
> > select IRQ_DOMAIN_HIERARCHY
> >
> > +# Generic IRQ IPI Mux support
> > +config GENERIC_IRQ_IPI_MUX
> > + bool
> > + depends on SMP
> > +
> > # Generic MSI interrupt support
> > config GENERIC_MSI_IRQ
> > bool
> > diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
> > index b4f53717d143..f19d3080bf11 100644
> > --- a/kernel/irq/Makefile
> > +++ b/kernel/irq/Makefile
> > @@ -15,6 +15,7 @@ obj-$(CONFIG_GENERIC_IRQ_MIGRATION) += cpuhotplug.o
> > obj-$(CONFIG_PM_SLEEP) += pm.o
> > obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o
> > obj-$(CONFIG_GENERIC_IRQ_IPI) += ipi.o
> > +obj-$(CONFIG_GENERIC_IRQ_IPI_MUX) += ipi-mux.o
> > obj-$(CONFIG_SMP) += affinity.o
> > obj-$(CONFIG_GENERIC_IRQ_DEBUGFS) += debugfs.o
> > obj-$(CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR) += matrix.o
> > diff --git a/kernel/irq/ipi-mux.c b/kernel/irq/ipi-mux.c
> > new file mode 100644
> > index 000000000000..259e00366dd7
> > --- /dev/null
> > +++ b/kernel/irq/ipi-mux.c
> > @@ -0,0 +1,268 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/*
> > + * Multiplex several virtual IPIs over a single HW IPI.
> > + *
> > + * Copyright The Asahi Linux Contributors
> > + * Copyright (c) 2022 Ventana Micro Systems Inc.
> > + */
> > +
> > +#define pr_fmt(fmt) "ipi-mux: " fmt
> > +#include <linux/cpu.h>
> > +#include <linux/init.h>
> > +#include <linux/irq.h>
> > +#include <linux/irqchip.h>
> > +#include <linux/irqchip/chained_irq.h>
> > +#include <linux/irqdomain.h>
> > +#include <linux/jump_label.h>
> > +#include <linux/percpu.h>
> > +#include <linux/smp.h>
> > +
> > +struct ipi_mux_cpu {
> > + atomic_t enable;
> > + atomic_t bits;
> > + struct cpumask send_mask;
> > +};
> > +
> > +struct ipi_mux_control {
> > + void *data;
> > + unsigned int nr;
>
> Honestly, I think we can get rid of this. The number of IPIs Linux
> uses is pretty small, and assuming a huge value (like 32) would be
> enough. It would save looking up this value on each IPI handling.

I had kept in-case some driver wanted to create fewer (< 32)
muxed IPIs.

>
> > + unsigned int parent_virq;
> > + struct irq_domain *domain;
> > + const struct ipi_mux_ops *ops;
> > + struct ipi_mux_cpu __percpu *cpu;
> > +};
> > +
> > +static struct ipi_mux_control *imux;
> > +static DEFINE_STATIC_KEY_FALSE(imux_pre_handle);
> > +static DEFINE_STATIC_KEY_FALSE(imux_post_handle);
> > +
> > +static void ipi_mux_mask(struct irq_data *d)
> > +{
> > + struct ipi_mux_cpu *icpu = this_cpu_ptr(imux->cpu);
> > +
> > + atomic_andnot(BIT(irqd_to_hwirq(d)), &icpu->enable);
> > +}
> > +
> > +static void ipi_mux_unmask(struct irq_data *d)
> > +{
> > + u32 ibit = BIT(irqd_to_hwirq(d));
> > + struct ipi_mux_cpu *icpu = this_cpu_ptr(imux->cpu);
> > +
> > + atomic_or(ibit, &icpu->enable);
> > +
> > + /*
> > + * The atomic_or() above must complete before the atomic_read()
> > + * below to avoid racing ipi_mux_send_mask().
> > + */
> > + smp_mb__after_atomic();
> > +
> > + /* If a pending IPI was unmasked, raise a parent IPI immediately. */
> > + if (atomic_read(&icpu->bits) & ibit)
> > + imux->ops->ipi_mux_send(imux->parent_virq, imux->data,
> > + cpumask_of(smp_processor_id()));
> > +}
> > +
> > +static void ipi_mux_send_mask(struct irq_data *d, const struct cpumask *mask)
> > +{
> > + u32 ibit = BIT(irqd_to_hwirq(d));
> > + struct ipi_mux_cpu *icpu = this_cpu_ptr(imux->cpu);
> > + struct cpumask *send_mask = &icpu->send_mask;
> > + unsigned long flags;
> > + int cpu;
> > +
> > + /*
> > + * We use send_mask as a per-CPU variable so disable local
> > + * interrupts to avoid being preempted.
> > + */
> > + local_irq_save(flags);
> > +
> > + cpumask_clear(send_mask);
> > +
> > + for_each_cpu(cpu, mask) {
> > + icpu = per_cpu_ptr(imux->cpu, cpu);
> > + atomic_or(ibit, &icpu->bits);
> > +
> > + /*
> > + * The atomic_or() above must complete before
> > + * the atomic_read() below to avoid racing with
> > + * ipi_mux_unmask().
> > + */
> > + smp_mb__after_atomic();
> > +
> > + if (atomic_read(&icpu->enable) & ibit)
> > + cpumask_set_cpu(cpu, send_mask);
> > + }
> > +
> > + /* Trigger the parent IPI */
> > + imux->ops->ipi_mux_send(imux->parent_virq, imux->data, send_mask);
> > +
> > + local_irq_restore(flags);
> > +}
> > +
> > +static const struct irq_chip ipi_mux_chip = {
> > + .name = "IPI Mux",
> > + .irq_mask = ipi_mux_mask,
> > + .irq_unmask = ipi_mux_unmask,
> > + .ipi_send_mask = ipi_mux_send_mask,
> > +};
>
> I really think this could either be supplied by the irqchip, or
> somehow patched to avoid the pointless imux->ops->ipi_mux_send
> indirection. Pointer chasing hurts.

Once we remove ipi_mux_pre/post_handle() callbacks, the
"ops" will be pointless and we will be able to remove one level
of indirection here.

We certainly need a mux irqchip to implement the
mask/unmask semantics for muxed IPIs.

>
> > +
> > +static int ipi_mux_domain_alloc(struct irq_domain *d, unsigned int virq,
> > + unsigned int nr_irqs, void *arg)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < nr_irqs; i++) {
> > + irq_set_percpu_devid(virq + i);
> > + irq_domain_set_info(d, virq + i, i,
> > + &ipi_mux_chip, d->host_data,
> > + handle_percpu_devid_irq, NULL, NULL);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static const struct irq_domain_ops ipi_mux_domain_ops = {
> > + .alloc = ipi_mux_domain_alloc,
> > + .free = irq_domain_free_irqs_top,
> > +};
> > +
> > +/**
> > + * ipi_mux_process - Process multiplexed virtual IPIs
> > + */
> > +void ipi_mux_process(void)
> > +{
> > + struct ipi_mux_cpu *icpu = this_cpu_ptr(imux->cpu);
> > + irq_hw_number_t hwirq;
> > + unsigned long ipis;
> > + int en;
>
> Please use an unsigned type here (see the rationale in atomic_t.txt).

Sure, I will update.

>
> > +
> > + if (static_branch_unlikely(&imux_pre_handle))
> > + imux->ops->ipi_mux_pre_handle(imux->parent_virq, imux->data);
> > +
> > + /*
> > + * Reading enable mask does not need to be ordered as long as
> > + * this function called from interrupt handler because only
> > + * the CPU itself can change it's own enable mask.
> > + */
> > + en = atomic_read(&icpu->enable);
> > +
> > + /*
> > + * Clear the IPIs we are about to handle. This pairs with the
> > + * atomic_fetch_or_release() in ipi_mux_send_mask().
> > + */
> > + ipis = atomic_fetch_andnot(en, &icpu->bits) & en;
> > +
> > + for_each_set_bit(hwirq, &ipis, imux->nr)
> > + generic_handle_domain_irq(imux->domain, hwirq);
> > +
> > + if (static_branch_unlikely(&imux_post_handle))
> > + imux->ops->ipi_mux_post_handle(imux->parent_virq, imux->data);
>
> Do you see what I meant about the {pre,post}_handle callback, and how
> they are *exactly* like irq_{ack,eoi}?

I see. I will remove these in the next patch revision.

>
> > +}
> > +
> > +static void ipi_mux_handler(struct irq_desc *desc)
> > +{
> > + struct irq_chip *chip = irq_desc_get_chip(desc);
> > +
> > + chained_irq_enter(chip, desc);
> > + ipi_mux_process();
> > + chained_irq_exit(chip, desc);
> > +}
> > +
> > +/**
> > + * ipi_mux_create - Create virtual IPIs multiplexed on top of a single
> > + * parent IPI.
> > + * @parent_virq: virq of the parent per-CPU IRQ
> > + * @nr_ipi: number of virtual IPIs to create. This should
> > + * be <= BITS_PER_TYPE(int)
> > + * @ops: multiplexing operations for the parent IPI
> > + * @data: opaque data used by the multiplexing operations
>
> What is the use for data? If anything, that data should be passed via
> the mux interrupt. But the whole point of this is to make the mux
> invisible. So this whole 'data' business is a mystery to me.

This is added only to pass back driver data in ipi_mux_send().

>
> > + *
> > + * If the parent IPI > 0 then ipi_mux_process() will be automatically
> > + * called via chained handler.
> > + *
> > + * If the parent IPI <= 0 then it is responsibility of irqchip drivers
> > + * to explicitly call ipi_mux_process() for processing muxed IPIs.
>
> 0 is a much more idiomatic value for "no parent IRQ". < 0 normally
> represents an error.

I am going to remove the "parent_virq" parameter itself so this
will go away.

>
> > + *
> > + * Returns first virq of the newly created virtual IPIs upon success
> > + * or <=0 upon failure
> > + */
> > +int ipi_mux_create(unsigned int parent_virq, unsigned int nr_ipi,
>
> Tell me how I can express a negative parent_virq here?

Sure, I will fix this.

>
> > + const struct ipi_mux_ops *ops, void *data)
> > +{
> > + struct fwnode_handle *fwnode;
> > + struct irq_domain *domain;
> > + int rc;
> > +
> > + if (imux)
> > + return -EEXIST;
> > +
> > + if (BITS_PER_TYPE(int) < nr_ipi || !ops || !ops->ipi_mux_send)
> > + return -EINVAL;
> > +
> > + if (parent_virq &&
> > + !irqd_is_per_cpu(irq_desc_get_irq_data(irq_to_desc(parent_virq))))
> > + return -EINVAL;
>
> See how buggy this is if I follow your definition of "no parent IRQ"?

Sure, I will fix this.

>
> M.
>
> --
> Without deviation from the norm, progress is not possible.

Thanks,
Anup

2022-11-26 15:45:49

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v11 3/7] genirq: Add mechanism to multiplex a single HW IPI

On Sat, 26 Nov 2022 13:31:46 +0000,
Anup Patel <[email protected]> wrote:
>
> On Sat, Nov 26, 2022 at 6:12 PM Marc Zyngier <[email protected]> wrote:
> >
> > On Mon, 14 Nov 2022 09:39:00 +0000,
> > Anup Patel <[email protected]> wrote:
> > >
> > > +struct ipi_mux_control {
> > > + void *data;
> > > + unsigned int nr;
> >
> > Honestly, I think we can get rid of this. The number of IPIs Linux
> > uses is pretty small, and assuming a huge value (like 32) would be
> > enough. It would save looking up this value on each IPI handling.
>
> I had kept in-case some driver wanted to create fewer (< 32)
> muxed IPIs.

I'm fine with being able to specifying the max, but I'm not sure there
is a need to keep track of it any further. Certainly, the overhead of
loading this value on each IPI could be removed. On most architecture,
for_each_set_bit() and co and better optimised with a fixed number of
bits.

> > > +static const struct irq_chip ipi_mux_chip = {
> > > + .name = "IPI Mux",
> > > + .irq_mask = ipi_mux_mask,
> > > + .irq_unmask = ipi_mux_unmask,
> > > + .ipi_send_mask = ipi_mux_send_mask,
> > > +};
> >
> > I really think this could either be supplied by the irqchip, or
> > somehow patched to avoid the pointless imux->ops->ipi_mux_send
> > indirection. Pointer chasing hurts.
>
> Once we remove ipi_mux_pre/post_handle() callbacks, the
> "ops" will be pointless and we will be able to remove one level
> of indirection here.
>
> We certainly need a mux irqchip to implement the
> mask/unmask semantics for muxed IPIs.

I'm not disputing that last point.

> > > +/**
> > > + * ipi_mux_create - Create virtual IPIs multiplexed on top of a single
> > > + * parent IPI.
> > > + * @parent_virq: virq of the parent per-CPU IRQ
> > > + * @nr_ipi: number of virtual IPIs to create. This should
> > > + * be <= BITS_PER_TYPE(int)
> > > + * @ops: multiplexing operations for the parent IPI
> > > + * @data: opaque data used by the multiplexing operations
> >
> > What is the use for data? If anything, that data should be passed via
> > the mux interrupt. But the whole point of this is to make the mux
> > invisible. So this whole 'data' business is a mystery to me.
>
> This is added only to pass back driver data in ipi_mux_send().

Again, what is the purpose of such data? If you need per-interrupt
data, this should be provided by the requester of the interrupt.

M.

--
Without deviation from the norm, progress is not possible.

2022-11-26 16:06:20

by Anup Patel

[permalink] [raw]
Subject: Re: [PATCH v11 3/7] genirq: Add mechanism to multiplex a single HW IPI

On Sat, Nov 26, 2022 at 7:58 PM Marc Zyngier <[email protected]> wrote:
>
> On Sat, 26 Nov 2022 13:31:46 +0000,
> Anup Patel <[email protected]> wrote:
> >
> > On Sat, Nov 26, 2022 at 6:12 PM Marc Zyngier <[email protected]> wrote:
> > >
> > > On Mon, 14 Nov 2022 09:39:00 +0000,
> > > Anup Patel <[email protected]> wrote:
> > > >
> > > > +struct ipi_mux_control {
> > > > + void *data;
> > > > + unsigned int nr;
> > >
> > > Honestly, I think we can get rid of this. The number of IPIs Linux
> > > uses is pretty small, and assuming a huge value (like 32) would be
> > > enough. It would save looking up this value on each IPI handling.
> >
> > I had kept in-case some driver wanted to create fewer (< 32)
> > muxed IPIs.
>
> I'm fine with being able to specifying the max, but I'm not sure there
> is a need to keep track of it any further. Certainly, the overhead of
> loading this value on each IPI could be removed. On most architecture,
> for_each_set_bit() and co and better optimised with a fixed number of
> bits.

Okay, I will update like you suggested.

>
> > > > +static const struct irq_chip ipi_mux_chip = {
> > > > + .name = "IPI Mux",
> > > > + .irq_mask = ipi_mux_mask,
> > > > + .irq_unmask = ipi_mux_unmask,
> > > > + .ipi_send_mask = ipi_mux_send_mask,
> > > > +};
> > >
> > > I really think this could either be supplied by the irqchip, or
> > > somehow patched to avoid the pointless imux->ops->ipi_mux_send
> > > indirection. Pointer chasing hurts.
> >
> > Once we remove ipi_mux_pre/post_handle() callbacks, the
> > "ops" will be pointless and we will be able to remove one level
> > of indirection here.
> >
> > We certainly need a mux irqchip to implement the
> > mask/unmask semantics for muxed IPIs.
>
> I'm not disputing that last point.
>
> > > > +/**
> > > > + * ipi_mux_create - Create virtual IPIs multiplexed on top of a single
> > > > + * parent IPI.
> > > > + * @parent_virq: virq of the parent per-CPU IRQ
> > > > + * @nr_ipi: number of virtual IPIs to create. This should
> > > > + * be <= BITS_PER_TYPE(int)
> > > > + * @ops: multiplexing operations for the parent IPI
> > > > + * @data: opaque data used by the multiplexing operations
> > >
> > > What is the use for data? If anything, that data should be passed via
> > > the mux interrupt. But the whole point of this is to make the mux
> > > invisible. So this whole 'data' business is a mystery to me.
> >
> > This is added only to pass back driver data in ipi_mux_send().
>
> Again, what is the purpose of such data? If you need per-interrupt
> data, this should be provided by the requester of the interrupt.

Currently, the irqchip drivers that we care about don't need this
data pointer so I will remove it. If required we can add it in future.

>
> M.
>
> --
> Without deviation from the norm, progress is not possible.

Thanks,
Anup