2023-04-08 17:18:29

by Shanker Donthineni

[permalink] [raw]
Subject: [PATCH v2 0/3] Increase the number of IRQ descriptors for SPARSEIRQ

The ARM64 architecture uses SPARSEIRQ with a default value of NR_IRQS,
which is set to 64. This means that only 64+8192 IRQ descriptors are
allowed, which may not be sufficient for modern ARM64 servers that
have a large number of IO devices and GIC hardware that supports
direct vSGI and vLPI injection features.

This limitation has caused issues when attempting to launch multiple
virtual machines with GICv4.1 features, resulting in the error message
'kvm_err("VPE IRQ allocation failure\n")'. The root cause of this issue
is the ~8K IRQ descriptor limit.

To address this issue, an initial proposal was made to define NR_IRQS
to 2^19 for ARM64. However, Marc Zyngier suggested implementing a
generic solution instead of hard-coded values. Thomas Gleixner advised
to use the maple tree data structure and provided most of the necessary
functions.

For more information, refer to the discussion thread at
https://lore.kernel.org/linux-arm-kernel/[email protected]/.

This patch series converts the static memory allocation to dynamic using
the maple tree, and increases the maximum number of IRQ descriptors to
INT_MAX from NR_IRQS+8192. This change has been tested on an ARM64 server
with CONFIG_SPARSE_IRQ=y, where 256 virtual machines were launched,
creating a total of 128K+ IRQ descriptors, and IRQ injection was verified.

The maple_tree RCU mode fixes are prerequisites, expecting all these
changes will be available in v6.4-rc1.
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/lib/maple_tree.c?h=next-20230406
- maple_tree: add RCU lock checking to rcu callback functions
- maple_tree: add smp_rmb() to dead node detection
- maple_tree: fix write memory barrier of nodes once dead for RCU mode
- maple_tree: remove extra smp_wmb() from mas_dead_leaves()
- maple_tree: fix freeing of nodes in rcu mode
- maple_tree: detect dead nodes in mas_start()
- maple_tree: be more cautious about dead nodes

Changes in v2:
- The patches have been updated to v6.3-rc5.
- The patches 2/5 and 4/5 have been removed as they are unnecessary.
- The review comments from Thomas have been addressed.

Shanker Donthineni (3):
genirq: Use hlist for managing resend handlers
genirq: Encapsulate sparse bitmap handling
genirq: Use the maple tree for IRQ descriptors management

include/linux/irqdesc.h | 3 ++
kernel/irq/chip.c | 1 +
kernel/irq/internals.h | 5 +--
kernel/irq/irqdesc.c | 80 ++++++++++++++++++++++++++---------------
kernel/irq/resend.c | 41 ++++++++++++---------
5 files changed, 82 insertions(+), 48 deletions(-)

--
2.25.1


2023-04-08 17:27:43

by Shanker Donthineni

[permalink] [raw]
Subject: [PATCH v2 3/3] genirq: Use the maple tree for IRQ descriptors management

The current implementation uses a static bitmap and a radix tree
to manage IRQ allocation and irq_desc pointer store respectively.
However, the size of the bitmap is constrained by the build time
macro MAX_SPARSE_IRQS, which may not be sufficient to support the
high-end servers, particularly those with GICv4.1 hardware, which
require a large interrupt space to cover LPIs and vSGIs

The maple tree is a highly efficient data structure for storing
non-overlapping ranges and can handle a large number of entries,
up to ULONG_MAX. It can be utilized for both storing IRQD and
identifying available free spaces.

The interrupt descriptors management can be simplified by switching
to a maple tree data structure, which offers greater flexibility
and scalability. To support modern servers, the maximum number of
IRQs has been increased to INT_MAX, which provides a more adequate
value than the previous limit of NR_IRQS+8192.

Signed-off-by: Shanker Donthineni <[email protected]>
---
kernel/irq/internals.h | 2 +-
kernel/irq/irqdesc.c | 54 +++++++++++++++++++++++-------------------
2 files changed, 31 insertions(+), 25 deletions(-)

diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 5d741b0e7d5e..e35de737802c 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -12,7 +12,7 @@
#include <linux/sched/clock.h>

#ifdef CONFIG_SPARSE_IRQ
-# define MAX_SPARSE_IRQS (NR_IRQS + 8196)
+# define MAX_SPARSE_IRQS INT_MAX
#else
# define MAX_SPARSE_IRQS NR_IRQS
#endif
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index e0c259769d3a..e2e95e937c11 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -12,8 +12,7 @@
#include <linux/export.h>
#include <linux/interrupt.h>
#include <linux/kernel_stat.h>
-#include <linux/radix-tree.h>
-#include <linux/bitmap.h>
+#include <linux/maple_tree.h>
#include <linux/irqdomain.h>
#include <linux/sysfs.h>

@@ -131,17 +130,38 @@ int nr_irqs = NR_IRQS;
EXPORT_SYMBOL_GPL(nr_irqs);

static DEFINE_MUTEX(sparse_irq_lock);
-static DECLARE_BITMAP(allocated_irqs, MAX_SPARSE_IRQS);
+static struct maple_tree sparse_irqs = MTREE_INIT_EXT(sparse_irqs,
+ MT_FLAGS_ALLOC_RANGE |
+ MT_FLAGS_LOCK_EXTERN |
+ MT_FLAGS_USE_RCU,
+ sparse_irq_lock);

static int irq_find_free_area(unsigned int from, unsigned int cnt)
{
- return bitmap_find_next_zero_area(allocated_irqs, MAX_SPARSE_IRQS,
- from, cnt, 0);
+ MA_STATE(mas, &sparse_irqs, 0, 0);
+
+ if (mas_empty_area(&mas, from, MAX_SPARSE_IRQS, cnt))
+ return -ENOSPC;
+ return mas.index;
}

static unsigned int irq_find_next_irq(unsigned int offset)
{
- return find_next_bit(allocated_irqs, nr_irqs, offset);
+ struct irq_desc *desc = mt_next(&sparse_irqs, offset, nr_irqs);
+
+ return desc ? irq_desc_get_irq(desc) : nr_irqs;
+}
+
+static void irq_insert_desc(unsigned int irq, struct irq_desc *desc)
+{
+ MA_STATE(mas, &sparse_irqs, irq, irq);
+ WARN_ON(mas_store_gfp(&mas, desc, GFP_KERNEL) != 0);
+}
+
+static void delete_irq_desc(unsigned int irq)
+{
+ MA_STATE(mas, &sparse_irqs, irq, irq);
+ mas_erase(&mas);
}

#ifdef CONFIG_SPARSE_IRQ
@@ -355,26 +375,14 @@ static void irq_sysfs_del(struct irq_desc *desc) {}

#endif /* CONFIG_SYSFS */

-static RADIX_TREE(irq_desc_tree, GFP_KERNEL);
-
-static void irq_insert_desc(unsigned int irq, struct irq_desc *desc)
-{
- radix_tree_insert(&irq_desc_tree, irq, desc);
-}
-
struct irq_desc *irq_to_desc(unsigned int irq)
{
- return radix_tree_lookup(&irq_desc_tree, irq);
+ return mtree_load(&sparse_irqs, irq);
}
#ifdef CONFIG_KVM_BOOK3S_64_HV_MODULE
EXPORT_SYMBOL_GPL(irq_to_desc);
#endif

-static void delete_irq_desc(unsigned int irq)
-{
- radix_tree_delete(&irq_desc_tree, irq);
-}
-
#ifdef CONFIG_SMP
static void free_masks(struct irq_desc *desc)
{
@@ -519,7 +527,6 @@ static int alloc_descs(unsigned int start, unsigned int cnt, int node,
irq_sysfs_add(start + i, desc);
irq_add_debugfs_entry(start + i, desc);
}
- bitmap_set(allocated_irqs, start, cnt);
return start;

err:
@@ -559,7 +566,6 @@ int __init early_irq_init(void)

for (i = 0; i < initcnt; i++) {
desc = alloc_desc(i, node, 0, NULL, NULL);
- set_bit(i, allocated_irqs);
irq_insert_desc(i, desc);
}
return arch_early_irq_init();
@@ -616,6 +622,7 @@ static void free_desc(unsigned int irq)
raw_spin_lock_irqsave(&desc->lock, flags);
desc_set_defaults(irq, desc, irq_desc_get_node(desc), NULL, NULL);
raw_spin_unlock_irqrestore(&desc->lock, flags);
+ delete_irq_desc(irq);
}

static inline int alloc_descs(unsigned int start, unsigned int cnt, int node,
@@ -628,8 +635,8 @@ static inline int alloc_descs(unsigned int start, unsigned int cnt, int node,
struct irq_desc *desc = irq_to_desc(start + i);

desc->owner = owner;
+ irq_insert_desc(start + i, desc);
}
- bitmap_set(allocated_irqs, start, cnt);
return start;
}

@@ -641,7 +648,7 @@ static int irq_expand_nr_irqs(unsigned int nr)
void irq_mark_irq(unsigned int irq)
{
mutex_lock(&sparse_irq_lock);
- bitmap_set(allocated_irqs, irq, 1);
+ irq_insert_desc(irq, irq_desc + irq);
mutex_unlock(&sparse_irq_lock);
}

@@ -785,7 +792,6 @@ void irq_free_descs(unsigned int from, unsigned int cnt)
for (i = 0; i < cnt; i++)
free_desc(from + i);

- bitmap_clear(allocated_irqs, from, cnt);
mutex_unlock(&sparse_irq_lock);
}
EXPORT_SYMBOL_GPL(irq_free_descs);
--
2.25.1

2023-04-08 17:31:35

by Shanker Donthineni

[permalink] [raw]
Subject: [PATCH v2 1/3] genirq: Use hlist for managing resend handlers

The current implementation utilizes a bitmap for managing IRQ resend
handlers, which is allocated based on the SPARSE_IRQ/NR_IRQS macros.
However, this method may not efficiently utilize memory during runtime,
particularly when IRQ_BITMAP_BITS is large.

This proposed patch aims to address this issue by using hlist to manage
IRQ resend handlers instead of relying on static memory allocation.
Additionally, a new function, clear_irq_resend(), is introduced and
called from irq_shutdown to ensure a graceful teardown of IRQD.

Signed-off-by: Shanker Donthineni <[email protected]>
---
include/linux/irqdesc.h | 3 +++
kernel/irq/chip.c | 1 +
kernel/irq/internals.h | 1 +
kernel/irq/irqdesc.c | 6 ++++++
kernel/irq/resend.c | 41 ++++++++++++++++++++++++-----------------
5 files changed, 35 insertions(+), 17 deletions(-)

diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index 844a8e30e6de..d9451d456a73 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -102,6 +102,9 @@ struct irq_desc {
int parent_irq;
struct module *owner;
const char *name;
+#ifdef CONFIG_HARDIRQS_SW_RESEND
+ struct hlist_node resend_node;
+#endif
} ____cacheline_internodealigned_in_smp;

#ifdef CONFIG_SPARSE_IRQ
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 49e7bc871fec..2eac5532c3c8 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -306,6 +306,7 @@ static void __irq_disable(struct irq_desc *desc, bool mask);
void irq_shutdown(struct irq_desc *desc)
{
if (irqd_is_started(&desc->irq_data)) {
+ clear_irq_resend(desc);
desc->depth = 1;
if (desc->irq_data.chip->irq_shutdown) {
desc->irq_data.chip->irq_shutdown(&desc->irq_data);
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 5fdc0b557579..2fd17057ed4b 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -113,6 +113,7 @@ irqreturn_t handle_irq_event(struct irq_desc *desc);

/* Resending of interrupts :*/
int check_irq_resend(struct irq_desc *desc, bool inject);
+void clear_irq_resend(struct irq_desc *desc);
bool irq_wait_for_poll(struct irq_desc *desc);
void __irq_wake_thread(struct irq_desc *desc, struct irqaction *action);

diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 240e145e969f..47543b5a0edb 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -415,6 +415,9 @@ static struct irq_desc *alloc_desc(int irq, int node, unsigned int flags,
desc_set_defaults(irq, desc, node, affinity, owner);
irqd_set(&desc->irq_data, flags);
kobject_init(&desc->kobj, &irq_kobj_type);
+#ifdef CONFIG_HARDIRQS_SW_RESEND
+ INIT_HLIST_NODE(&desc->resend_node);
+#endif

return desc;

@@ -581,6 +584,9 @@ int __init early_irq_init(void)
mutex_init(&desc[i].request_mutex);
init_waitqueue_head(&desc[i].wait_for_threads);
desc_set_defaults(i, &desc[i], node, NULL, NULL);
+#ifdef CONFIG_HARDIRQS_SW_RESEND
+ INIT_HLIST_NODE(&desc->resend_node);
+#endif
}
return arch_early_irq_init();
}
diff --git a/kernel/irq/resend.c b/kernel/irq/resend.c
index 0c46e9fe3a89..d3db2628a720 100644
--- a/kernel/irq/resend.c
+++ b/kernel/irq/resend.c
@@ -21,8 +21,9 @@

#ifdef CONFIG_HARDIRQS_SW_RESEND

-/* Bitmap to handle software resend of interrupts: */
-static DECLARE_BITMAP(irqs_resend, IRQ_BITMAP_BITS);
+/* hlist_head to handle software resend of interrupts: */
+static HLIST_HEAD(irq_resend_list);
+static DEFINE_RAW_SPINLOCK(irq_resend_lock);

/*
* Run software resends of IRQ's
@@ -30,18 +31,17 @@ static DECLARE_BITMAP(irqs_resend, IRQ_BITMAP_BITS);
static void resend_irqs(struct tasklet_struct *unused)
{
struct irq_desc *desc;
- int irq;
-
- while (!bitmap_empty(irqs_resend, nr_irqs)) {
- irq = find_first_bit(irqs_resend, nr_irqs);
- clear_bit(irq, irqs_resend);
- desc = irq_to_desc(irq);
- if (!desc)
- continue;
- local_irq_disable();
+
+ raw_spin_lock_irq(&irq_resend_lock);
+ while (!hlist_empty(&irq_resend_list)) {
+ desc = hlist_entry(irq_resend_list.first, struct irq_desc,
+ resend_node);
+ hlist_del_init(&desc->resend_node);
+ raw_spin_unlock(&irq_resend_lock);
desc->handle_irq(desc);
- local_irq_enable();
+ raw_spin_lock(&irq_resend_lock);
}
+ raw_spin_unlock_irq(&irq_resend_lock);
}

/* Tasklet to handle resend: */
@@ -49,8 +49,6 @@ static DECLARE_TASKLET(resend_tasklet, resend_irqs);

static int irq_sw_resend(struct irq_desc *desc)
{
- unsigned int irq = irq_desc_get_irq(desc);
-
/*
* Validate whether this interrupt can be safely injected from
* non interrupt context
@@ -70,16 +68,25 @@ static int irq_sw_resend(struct irq_desc *desc)
*/
if (!desc->parent_irq)
return -EINVAL;
- irq = desc->parent_irq;
}

- /* Set it pending and activate the softirq: */
- set_bit(irq, irqs_resend);
+ /* Add to resend_list and activate the softirq: */
+ raw_spin_lock(&irq_resend_lock);
+ hlist_add_head(&desc->resend_node, &irq_resend_list);
+ raw_spin_unlock(&irq_resend_lock);
tasklet_schedule(&resend_tasklet);
return 0;
}

+void clear_irq_resend(struct irq_desc *desc)
+{
+ raw_spin_lock(&irq_resend_lock);
+ hlist_del_init(&desc->resend_node);
+ raw_spin_unlock(&irq_resend_lock);
+}
#else
+void clear_irq_resend(struct irq_desc *desc) {}
+
static int irq_sw_resend(struct irq_desc *desc)
{
return -EINVAL;
--
2.25.1

2023-04-08 17:33:50

by Shanker Donthineni

[permalink] [raw]
Subject: [PATCH v2 2/3] genirq: Encapsulate sparse bitmap handling

Move the open coded sparse bitmap handling into helper functions as
a preparatory step for converting the sparse interrupt management
to a maple tree.

No functional change.

Signed-off-by: Shanker Donthineni <[email protected]>
---
kernel/irq/internals.h | 4 ++--
kernel/irq/irqdesc.c | 28 +++++++++++++++++++---------
2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 2fd17057ed4b..5d741b0e7d5e 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -12,9 +12,9 @@
#include <linux/sched/clock.h>

#ifdef CONFIG_SPARSE_IRQ
-# define IRQ_BITMAP_BITS (NR_IRQS + 8196)
+# define MAX_SPARSE_IRQS (NR_IRQS + 8196)
#else
-# define IRQ_BITMAP_BITS NR_IRQS
+# define MAX_SPARSE_IRQS NR_IRQS
#endif

#define istate core_internal_state__do_not_mess_with_it
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 47543b5a0edb..e0c259769d3a 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -131,7 +131,18 @@ int nr_irqs = NR_IRQS;
EXPORT_SYMBOL_GPL(nr_irqs);

static DEFINE_MUTEX(sparse_irq_lock);
-static DECLARE_BITMAP(allocated_irqs, IRQ_BITMAP_BITS);
+static DECLARE_BITMAP(allocated_irqs, MAX_SPARSE_IRQS);
+
+static int irq_find_free_area(unsigned int from, unsigned int cnt)
+{
+ return bitmap_find_next_zero_area(allocated_irqs, MAX_SPARSE_IRQS,
+ from, cnt, 0);
+}
+
+static unsigned int irq_find_next_irq(unsigned int offset)
+{
+ return find_next_bit(allocated_irqs, nr_irqs, offset);
+}

#ifdef CONFIG_SPARSE_IRQ

@@ -519,7 +530,7 @@ static int alloc_descs(unsigned int start, unsigned int cnt, int node,

static int irq_expand_nr_irqs(unsigned int nr)
{
- if (nr > IRQ_BITMAP_BITS)
+ if (nr > MAX_SPARSE_IRQS)
return -ENOMEM;
nr_irqs = nr;
return 0;
@@ -537,11 +548,11 @@ int __init early_irq_init(void)
printk(KERN_INFO "NR_IRQS: %d, nr_irqs: %d, preallocated irqs: %d\n",
NR_IRQS, nr_irqs, initcnt);

- if (WARN_ON(nr_irqs > IRQ_BITMAP_BITS))
- nr_irqs = IRQ_BITMAP_BITS;
+ if (WARN_ON(nr_irqs > MAX_SPARSE_IRQS))
+ nr_irqs = MAX_SPARSE_IRQS;

- if (WARN_ON(initcnt > IRQ_BITMAP_BITS))
- initcnt = IRQ_BITMAP_BITS;
+ if (WARN_ON(initcnt > MAX_SPARSE_IRQS))
+ initcnt = MAX_SPARSE_IRQS;

if (initcnt > nr_irqs)
nr_irqs = initcnt;
@@ -816,8 +827,7 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,

mutex_lock(&sparse_irq_lock);

- start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS,
- from, cnt, 0);
+ start = irq_find_free_area(from, cnt);
ret = -EEXIST;
if (irq >=0 && start != irq)
goto unlock;
@@ -842,7 +852,7 @@ EXPORT_SYMBOL_GPL(__irq_alloc_descs);
*/
unsigned int irq_get_next_irq(unsigned int offset)
{
- return find_next_bit(allocated_irqs, nr_irqs, offset);
+ return irq_find_next_irq(offset);
}

struct irq_desc *
--
2.25.1

2023-04-09 12:18:25

by Shanker Donthineni

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] genirq: Use hlist for managing resend handlers

Hi Marc,

On 4/9/23 04:10, Marc Zyngier wrote:
> External email: Use caution opening links or attachments
>
>
> On Sat, 08 Apr 2023 18:15:24 +0100,
> Shanker Donthineni <[email protected]> wrote:
>>
>> The current implementation utilizes a bitmap for managing IRQ resend
>> handlers, which is allocated based on the SPARSE_IRQ/NR_IRQS macros.
>> However, this method may not efficiently utilize memory during runtime,
>> particularly when IRQ_BITMAP_BITS is large.
>>
>> This proposed patch aims to address this issue by using hlist to manage
>
> s/This proposed patch aims to//
>
>> IRQ resend handlers instead of relying on static memory allocation.
>> Additionally, a new function, clear_irq_resend(), is introduced and
>> called from irq_shutdown to ensure a graceful teardown of IRQD.
>
> s/IRQD/the interrupt/
>
Ack

>>
>> Signed-off-by: Shanker Donthineni <[email protected]>
>> ---
>> include/linux/irqdesc.h | 3 +++
>> kernel/irq/chip.c | 1 +
>> kernel/irq/internals.h | 1 +
>> kernel/irq/irqdesc.c | 6 ++++++
>> kernel/irq/resend.c | 41 ++++++++++++++++++++++++-----------------
>> 5 files changed, 35 insertions(+), 17 deletions(-)
>>
>> diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
>> index 844a8e30e6de..d9451d456a73 100644
>> --- a/include/linux/irqdesc.h
>> +++ b/include/linux/irqdesc.h
>> @@ -102,6 +102,9 @@ struct irq_desc {
>> int parent_irq;
>> struct module *owner;
>> const char *name;
>> +#ifdef CONFIG_HARDIRQS_SW_RESEND
>> + struct hlist_node resend_node;
>> +#endif
>> } ____cacheline_internodealigned_in_smp;
>>
>> #ifdef CONFIG_SPARSE_IRQ
>> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>> index 49e7bc871fec..2eac5532c3c8 100644
>> --- a/kernel/irq/chip.c
>> +++ b/kernel/irq/chip.c
>> @@ -306,6 +306,7 @@ static void __irq_disable(struct irq_desc *desc, bool mask);
>> void irq_shutdown(struct irq_desc *desc)
>> {
>> if (irqd_is_started(&desc->irq_data)) {
>> + clear_irq_resend(desc);
>> desc->depth = 1;
>> if (desc->irq_data.chip->irq_shutdown) {
>> desc->irq_data.chip->irq_shutdown(&desc->irq_data);
>> diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
>> index 5fdc0b557579..2fd17057ed4b 100644
>> --- a/kernel/irq/internals.h
>> +++ b/kernel/irq/internals.h
>> @@ -113,6 +113,7 @@ irqreturn_t handle_irq_event(struct irq_desc *desc);
>>
>> /* Resending of interrupts :*/
>> int check_irq_resend(struct irq_desc *desc, bool inject);
>> +void clear_irq_resend(struct irq_desc *desc);
>> bool irq_wait_for_poll(struct irq_desc *desc);
>> void __irq_wake_thread(struct irq_desc *desc, struct irqaction *action);
>>
>> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
>> index 240e145e969f..47543b5a0edb 100644
>> --- a/kernel/irq/irqdesc.c
>> +++ b/kernel/irq/irqdesc.c
>> @@ -415,6 +415,9 @@ static struct irq_desc *alloc_desc(int irq, int node, unsigned int flags,
>> desc_set_defaults(irq, desc, node, affinity, owner);
>> irqd_set(&desc->irq_data, flags);
>> kobject_init(&desc->kobj, &irq_kobj_type);
>> +#ifdef CONFIG_HARDIRQS_SW_RESEND
>> + INIT_HLIST_NODE(&desc->resend_node);
>> +#endif
>
> Please add a helper that hides the #ifdefery from the core code, and
> use it in both spots.
> Sure, I will add a helper function irq_resend_init()

>>
>> return desc;
>>
>> @@ -581,6 +584,9 @@ int __init early_irq_init(void)
>> mutex_init(&desc[i].request_mutex);
>> init_waitqueue_head(&desc[i].wait_for_threads);
>> desc_set_defaults(i, &desc[i], node, NULL, NULL);
>> +#ifdef CONFIG_HARDIRQS_SW_RESEND
>> + INIT_HLIST_NODE(&desc->resend_node);
>> +#endif
>> }
>> return arch_early_irq_init();
>> }
>> diff --git a/kernel/irq/resend.c b/kernel/irq/resend.c
>> index 0c46e9fe3a89..d3db2628a720 100644
>> --- a/kernel/irq/resend.c
>> +++ b/kernel/irq/resend.c
>> @@ -21,8 +21,9 @@
>>
>> #ifdef CONFIG_HARDIRQS_SW_RESEND
>>
>> -/* Bitmap to handle software resend of interrupts: */
>> -static DECLARE_BITMAP(irqs_resend, IRQ_BITMAP_BITS);
>> +/* hlist_head to handle software resend of interrupts: */
>> +static HLIST_HEAD(irq_resend_list);
>> +static DEFINE_RAW_SPINLOCK(irq_resend_lock);
>>
>> /*
>> * Run software resends of IRQ's
>> @@ -30,18 +31,17 @@ static DECLARE_BITMAP(irqs_resend, IRQ_BITMAP_BITS);
>> static void resend_irqs(struct tasklet_struct *unused)
>> {
>> struct irq_desc *desc;
>> - int irq;
>> -
>> - while (!bitmap_empty(irqs_resend, nr_irqs)) {
>> - irq = find_first_bit(irqs_resend, nr_irqs);
>> - clear_bit(irq, irqs_resend);
>> - desc = irq_to_desc(irq);
>> - if (!desc)
>> - continue;
>> - local_irq_disable();
>> +
>> + raw_spin_lock_irq(&irq_resend_lock);
>> + while (!hlist_empty(&irq_resend_list)) {
>> + desc = hlist_entry(irq_resend_list.first, struct irq_desc,
>> + resend_node);
>> + hlist_del_init(&desc->resend_node);
>> + raw_spin_unlock(&irq_resend_lock);
>> desc->handle_irq(desc);
>> - local_irq_enable();
>> + raw_spin_lock(&irq_resend_lock);
>
> What makes it safe to drop the local_irq_*able()?
>
> tasklet_action_common() explicitly enables interrupts when calling the
> callback, so unless there is some other interrupt disabling that I
> can't immediately spot, the handler may run in the wrong context.
>

Unless I am overlooking something, interrupts are disabled within the while
loop unless desc->handle_irq() is enabling it. The existing code disables
and enables interrupts for each handler invocation, whereas the modified
code does it only once for all outstanding handlers.

Please let me know if the below code changes are acceptable. Similar to the
current interrupt disable/enable logic except the protection lock resend.

static void resend_irqs(struct tasklet_struct *unused)
{
struct irq_desc *desc;
- int irq;
-
- while (!bitmap_empty(irqs_resend, nr_irqs)) {
- irq = find_first_bit(irqs_resend, nr_irqs);
- clear_bit(irq, irqs_resend);
- desc = irq_to_desc(irq);
- if (!desc)
- continue;
+
+ raw_spin_lock(&irq_resend_lock);
+ while (!hlist_empty(&irq_resend_list)) {
+ desc = hlist_entry(irq_resend_list.first, struct irq_desc,
+ resend_node);
+ hlist_del_init(&desc->resend_node);
+ raw_spin_unlock(&irq_resend_lock);
+
local_irq_disable();
desc->handle_irq(desc);
local_irq_enable();
+
+ raw_spin_lock(&irq_resend_lock);
}
+ raw_spin_unlock(&irq_resend_lock);
}

> If there is such interrupt disabling, then please point it out in the
> commit message so that idiots like me can refer to it.
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.

2023-04-10 10:11:37

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] genirq: Use hlist for managing resend handlers

On Sun, 09 Apr 2023 13:00:27 +0100,
Shanker Donthineni <[email protected]> wrote:
>
> >> @@ -30,18 +31,17 @@ static DECLARE_BITMAP(irqs_resend, IRQ_BITMAP_BITS);
> >> static void resend_irqs(struct tasklet_struct *unused)
> >> {
> >> struct irq_desc *desc;
> >> - int irq;
> >> -
> >> - while (!bitmap_empty(irqs_resend, nr_irqs)) {
> >> - irq = find_first_bit(irqs_resend, nr_irqs);
> >> - clear_bit(irq, irqs_resend);
> >> - desc = irq_to_desc(irq);
> >> - if (!desc)
> >> - continue;
> >> - local_irq_disable();
> >> +
> >> + raw_spin_lock_irq(&irq_resend_lock);
> >> + while (!hlist_empty(&irq_resend_list)) {
> >> + desc = hlist_entry(irq_resend_list.first, struct irq_desc,
> >> + resend_node);
> >> + hlist_del_init(&desc->resend_node);
> >> + raw_spin_unlock(&irq_resend_lock);
> >> desc->handle_irq(desc);
> >> - local_irq_enable();
> >> + raw_spin_lock(&irq_resend_lock);
> >
> > What makes it safe to drop the local_irq_*able()?
> >
> > tasklet_action_common() explicitly enables interrupts when calling the
> > callback, so unless there is some other interrupt disabling that I
> > can't immediately spot, the handler may run in the wrong context.
> >
>
> Unless I am overlooking something, interrupts are disabled within the while
> loop unless desc->handle_irq() is enabling it. The existing code disables
> and enables interrupts for each handler invocation, whereas the modified
> code does it only once for all outstanding handlers.

Ah, you use raw_spinlock_irq() outside of the loop. I somehow glanced
over that, apologies for the noise. Unless we expect a really long
list of interrupts to be resent, your current code should be OK.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2023-04-10 12:02:41

by Shanker Donthineni

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] genirq: Use hlist for managing resend handlers



On 4/10/23 05:05, Marc Zyngier wrote:
> External email: Use caution opening links or attachments
>
>
> On Sun, 09 Apr 2023 13:00:27 +0100,
> Shanker Donthineni <[email protected]> wrote:
>>
>>>> @@ -30,18 +31,17 @@ static DECLARE_BITMAP(irqs_resend, IRQ_BITMAP_BITS);
>>>> static void resend_irqs(struct tasklet_struct *unused)
>>>> {
>>>> struct irq_desc *desc;
>>>> - int irq;
>>>> -
>>>> - while (!bitmap_empty(irqs_resend, nr_irqs)) {
>>>> - irq = find_first_bit(irqs_resend, nr_irqs);
>>>> - clear_bit(irq, irqs_resend);
>>>> - desc = irq_to_desc(irq);
>>>> - if (!desc)
>>>> - continue;
>>>> - local_irq_disable();
>>>> +
>>>> + raw_spin_lock_irq(&irq_resend_lock);
>>>> + while (!hlist_empty(&irq_resend_list)) {
>>>> + desc = hlist_entry(irq_resend_list.first, struct irq_desc,
>>>> + resend_node);
>>>> + hlist_del_init(&desc->resend_node);
>>>> + raw_spin_unlock(&irq_resend_lock);
>>>> desc->handle_irq(desc);
>>>> - local_irq_enable();
>>>> + raw_spin_lock(&irq_resend_lock);
>>>
>>> What makes it safe to drop the local_irq_*able()?
>>>
>>> tasklet_action_common() explicitly enables interrupts when calling the
>>> callback, so unless there is some other interrupt disabling that I
>>> can't immediately spot, the handler may run in the wrong context.
>>>
>>
>> Unless I am overlooking something, interrupts are disabled within the while
>> loop unless desc->handle_irq() is enabling it. The existing code disables
>> and enables interrupts for each handler invocation, whereas the modified
>> code does it only once for all outstanding handlers.
>
> Ah, you use raw_spinlock_irq() outside of the loop. I somehow glanced
> over that, apologies for the noise. Unless we expect a really long
> list of interrupts to be resent, your current code should be OK.
>

Thanks, I'll post v3 patches to address the other review comments.