2022-04-22 13:53:29

by Pingfan Liu

[permalink] [raw]
Subject: [PATCH 0/9] trival fix or improvement about irq_desc access

When I went through the access of irq_to_desc(), I found the following
trival issue or improvement can be done.
[1-4/9]: clean up or small improvement
[5-6/9]: bugfix for access to irq_desc under releasable context
[7/9]: clean up
[8-9/9]: bugfix for irq debugfs


Cc: Alexander Gordeev <[email protected]>
Cc: Baokun Li <[email protected]>
Cc: Bartosz Golaszewski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "Cédric Le Goater" <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dongli Zhang <[email protected]>
Cc: Florian Fainelli <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "Jason A. Donenfeld" <[email protected]>
Cc: "Maciej W. Rozycki" <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Oleksandr Natalenko <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Qais Yousef <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Steven Price <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Rix <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Wolfram Sang <[email protected]>
Cc: Yuan ZhaoXiong <[email protected]>
Cc: YueHaibing <[email protected]>
To: [email protected]


Pingfan Liu (9):
irq/irqdesc: put the lock at the exact place in irq_sysfs_init()
irq/irqdesc: change the name of delete_irq_desc() to irq_delete_desc()
irq/manage: remove some unreferenced code
s390/irq: utilize RCU instead of irq_lock_sparse() in
show_msi_interrupt()
x86/irq: place for_each_active_irq() in rcu read section
pm/irq: make for_each_irq_desc() safe of irq_desc release
irq: remove needless lock in takedown_cpu()
irq: make irq_lock_sparse() independent of CONFIG_SPARSE_IRQ
irq/irqdesc: rename sparse_irq_lock to bitmap_lock

.clang-format | 1 -
arch/s390/kernel/irq.c | 11 ++++----
arch/x86/kernel/apic/io_apic.c | 3 +++
include/linux/irq.h | 1 -
include/linux/irqdesc.h | 10 +++-----
include/linux/irqnr.h | 3 ---
kernel/cpu.c | 15 ++++-------
kernel/irq/cpuhotplug.c | 4 +--
kernel/irq/debugfs.c | 4 +--
kernel/irq/irqdesc.c | 47 +++++++++++++++++-----------------
kernel/irq/manage.c | 15 -----------
kernel/irq/pm.c | 3 +++
12 files changed, 47 insertions(+), 70 deletions(-)

--
2.31.1


2022-04-22 18:27:23

by Pingfan Liu

[permalink] [raw]
Subject: [PATCH 7/9] irq: remove needless lock in takedown_cpu()

Any allocation or release of irq_desc should fall into the interface of
__irq_alloc_descs() or irq_free_descs(). And both of them hold the mutex
sparse_irq_lock. This preemptability contradicts the preempt-disabled
context when dispatching fn in cpu_stopper_thread(). So allocation or
free of irq_desc can not be demanded in cpu_stopper_thread().

On the other hand, for the safety of access to irq_desc, rcu still keeps
watching the dying cpu until the last minute rcu_report_dead(), so each
cpu_stop_fn_t can safely access irq_desc.

As a result, in takedown_cpu() irq_lock_sparse()/irq_unlock_sparse() can
be safely removed.

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Baokun Li <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Steven Price <[email protected]>
Cc: "Jason A. Donenfeld" <[email protected]>
Cc: Yuan ZhaoXiong <[email protected]>
Cc: YueHaibing <[email protected]>
Cc: Randy Dunlap <[email protected]>
To: [email protected]
---
kernel/cpu.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index d0a9aa0b42e8..94a6b512c26d 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1033,18 +1033,16 @@ static int takedown_cpu(unsigned int cpu)
kthread_park(st->thread);

/*
- * Prevent irq alloc/free while the dying cpu reorganizes the
- * interrupt affinities.
+ * RCU keeps watching 'cpu' until do_idle()->rcu_report_dead().
+ * And cpu_stopper's fn is dispatched with preemption disabled.
+ * So it can not occur to release a irq_desc.
*/
- irq_lock_sparse();

/*
* So now all preempt/rcu users must observe !cpu_active().
*/
err = stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu));
if (err) {
- /* CPU refused to die */
- irq_unlock_sparse();
/* Unpark the hotplug thread so we can rollback there */
kthread_unpark(st->thread);
return err;
@@ -1061,9 +1059,6 @@ static int takedown_cpu(unsigned int cpu)
wait_for_ap_thread(st, false);
BUG_ON(st->state != CPUHP_AP_IDLE_DEAD);

- /* Interrupts are moved away from the dying cpu, reenable alloc/free */
- irq_unlock_sparse();
-
hotplug_cpu__broadcast_tick_pull(cpu);
/* This actually kills the CPU. */
__cpu_die(cpu);
--
2.31.1

2022-04-22 19:47:17

by Pingfan Liu

[permalink] [raw]
Subject: [PATCH 9/9] irq/irqdesc: rename sparse_irq_lock to bitmap_lock

Now that sparse_irq_lock serves for both sparse and non-sparse purpose,
it is better to rename it as bitmap_lock. Corresponding, rename
irq_lock_sparse() as irq_lock_bitmap().

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Oleksandr Natalenko <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Qais Yousef <[email protected]>
Cc: Yuan ZhaoXiong <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Dongli Zhang <[email protected]>
Cc: YueHaibing <[email protected]>
Cc: Steven Price <[email protected]>
To: [email protected]
---
include/linux/irqdesc.h | 4 ++--
kernel/cpu.c | 4 ++--
kernel/irq/cpuhotplug.c | 4 ++--
kernel/irq/debugfs.c | 4 ++--
kernel/irq/irqdesc.c | 26 +++++++++++++-------------
kernel/irq/pm.c | 4 ++--
6 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index 6c01231fec00..8301653fc2b2 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -104,8 +104,8 @@ struct irq_desc {
const char *name;
} ____cacheline_internodealigned_in_smp;

-extern void irq_lock_sparse(void);
-extern void irq_unlock_sparse(void);
+extern void irq_lock_bitmap(void);
+extern void irq_unlock_bitmap(void);

#ifndef CONFIG_SPARSE_IRQ
extern struct irq_desc irq_desc[NR_IRQS];
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 94a6b512c26d..8874f0242893 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -601,11 +601,11 @@ static int bringup_cpu(unsigned int cpu)
* setup the vector space for the cpu which comes online.
* Prevent irq alloc/free across the bringup.
*/
- irq_lock_sparse();
+ irq_lock_bitmap();

/* Arch-specific enabling code. */
ret = __cpu_up(cpu, idle);
- irq_unlock_sparse();
+ irq_unlock_bitmap();
if (ret)
return ret;
return bringup_wait_for_ap(cpu);
diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 1ed2b1739363..84547b5781be 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -220,14 +220,14 @@ int irq_affinity_online_cpu(unsigned int cpu)
struct irq_desc *desc;
unsigned int irq;

- irq_lock_sparse();
+ irq_lock_bitmap();
for_each_active_irq(irq) {
desc = irq_to_desc(irq);
raw_spin_lock_irq(&desc->lock);
irq_restore_affinity_of_irq(desc, cpu);
raw_spin_unlock_irq(&desc->lock);
}
- irq_unlock_sparse();
+ irq_unlock_bitmap();

return 0;
}
diff --git a/kernel/irq/debugfs.c b/kernel/irq/debugfs.c
index 2b43f5f5033d..af4bd2ca2372 100644
--- a/kernel/irq/debugfs.c
+++ b/kernel/irq/debugfs.c
@@ -251,10 +251,10 @@ static int __init irq_debugfs_init(void)

irq_dir = debugfs_create_dir("irqs", root_dir);

- irq_lock_sparse();
+ irq_lock_bitmap();
for_each_active_irq(irq)
irq_add_debugfs_entry(irq, irq_to_desc(irq));
- irq_unlock_sparse();
+ irq_unlock_bitmap();

return 0;
}
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index a5cefd7c9ef7..acd80cb8a511 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -130,7 +130,7 @@ static void desc_set_defaults(unsigned int irq, struct irq_desc *desc, int node,
int nr_irqs = NR_IRQS;
EXPORT_SYMBOL_GPL(nr_irqs);

-static DEFINE_MUTEX(sparse_irq_lock);
+static DEFINE_MUTEX(bitmap_lock);
static DECLARE_BITMAP(allocated_irqs, IRQ_BITMAP_BITS);

#ifdef CONFIG_SPARSE_IRQ
@@ -318,12 +318,12 @@ static int __init irq_sysfs_init(void)
}

/* Prevent concurrent irq alloc/free */
- irq_lock_sparse();
+ irq_lock_bitmap();

/* Add the already allocated interrupts */
for_each_irq_desc(irq, desc)
irq_sysfs_add(irq, desc);
- irq_unlock_sparse();
+ irq_unlock_bitmap();

return 0;
}
@@ -607,9 +607,9 @@ static int irq_expand_nr_irqs(unsigned int nr)

void irq_mark_irq(unsigned int irq)
{
- mutex_lock(&sparse_irq_lock);
+ mutex_lock(&bitmap_lock);
bitmap_set(allocated_irqs, irq, 1);
- mutex_unlock(&sparse_irq_lock);
+ mutex_unlock(&bitmap_lock);
}

#ifdef CONFIG_GENERIC_IRQ_LEGACY
@@ -711,14 +711,14 @@ int generic_handle_domain_nmi(struct irq_domain *domain, unsigned int hwirq)
}
#endif

-void irq_lock_sparse(void)
+void irq_lock_bitmap(void)
{
- mutex_lock(&sparse_irq_lock);
+ mutex_lock(&bitmap_lock);
}

-void irq_unlock_sparse(void)
+void irq_unlock_bitmap(void)
{
- mutex_unlock(&sparse_irq_lock);
+ mutex_unlock(&bitmap_lock);
}

/* Dynamic interrupt handling */
@@ -735,12 +735,12 @@ void irq_free_descs(unsigned int from, unsigned int cnt)
if (from >= nr_irqs || (from + cnt) > nr_irqs)
return;

- mutex_lock(&sparse_irq_lock);
+ mutex_lock(&bitmap_lock);
for (i = 0; i < cnt; i++)
free_desc(from + i);

bitmap_clear(allocated_irqs, from, cnt);
- mutex_unlock(&sparse_irq_lock);
+ mutex_unlock(&bitmap_lock);
}
EXPORT_SYMBOL_GPL(irq_free_descs);

@@ -779,7 +779,7 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
from = arch_dynirq_lower_bound(from);
}

- mutex_lock(&sparse_irq_lock);
+ mutex_lock(&bitmap_lock);

start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS,
from, cnt, 0);
@@ -794,7 +794,7 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
}
ret = alloc_descs(start, cnt, node, affinity, owner);
unlock:
- mutex_unlock(&sparse_irq_lock);
+ mutex_unlock(&bitmap_lock);
return ret;
}
EXPORT_SYMBOL_GPL(__irq_alloc_descs);
diff --git a/kernel/irq/pm.c b/kernel/irq/pm.c
index 4b67a4c7de3c..9b81a738d84f 100644
--- a/kernel/irq/pm.c
+++ b/kernel/irq/pm.c
@@ -133,7 +133,7 @@ void suspend_device_irqs(void)
struct irq_desc *desc;
int irq;

- irq_lock_sparse();
+ irq_lock_bitmap();
for_each_irq_desc(irq, desc) {
unsigned long flags;
bool sync;
@@ -147,7 +147,7 @@ void suspend_device_irqs(void)
if (sync)
synchronize_irq(irq);
}
- irq_unlock_sparse();
+ irq_unlock_bitmap();
}
EXPORT_SYMBOL_GPL(suspend_device_irqs);

--
2.31.1

2022-04-22 20:17:58

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 7/9] irq: remove needless lock in takedown_cpu()

On Wed, Apr 20 2022 at 22:05, Pingfan Liu wrote:

First of all, the subject prefix for the core interrupt subsystem is
'genirq' and the sentence after the colon starts with an uppercase
letter. See:

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html

> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index d0a9aa0b42e8..94a6b512c26d 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1033,18 +1033,16 @@ static int takedown_cpu(unsigned int cpu)
> kthread_park(st->thread);
>
> /*
> - * Prevent irq alloc/free while the dying cpu reorganizes the
> - * interrupt affinities.
> + * RCU keeps watching 'cpu' until do_idle()->rcu_report_dead().
> + * And cpu_stopper's fn is dispatched with preemption disabled.
> + * So it can not occur to release a irq_desc.
> */
> - irq_lock_sparse();

Not everything is about RCU here. You really need to look at all moving
parts:

irq_migrate_all_off_this_cpu() relies on the allocated_irqs bitmap and
the sparse tree to be in consistent state, which is only guaranteed when
the sparse lock is held.

I'm not sure what you are trying to solve here. Not taking sparse_irq_lock
here is not gaining anything.

Thanks,

tglx

2022-04-22 20:27:23

by Pingfan Liu

[permalink] [raw]
Subject: [PATCH 4/9] s390/irq: utilize RCU instead of irq_lock_sparse() in show_msi_interrupt()

irq_desc can be accessed safely in RCU read section as demonstrated by
kstat_irqs_usr(). And raw_spin_lock_irqsave() context can provide a rcu
read section, which can be utilized to get rid of irq_lock_sparse().

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: [email protected]
To: [email protected]
---
arch/s390/kernel/irq.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/s390/kernel/irq.c b/arch/s390/kernel/irq.c
index 3033f616e256..6302dc7874cf 100644
--- a/arch/s390/kernel/irq.c
+++ b/arch/s390/kernel/irq.c
@@ -205,12 +205,13 @@ static void show_msi_interrupt(struct seq_file *p, int irq)
unsigned long flags;
int cpu;

- irq_lock_sparse();
+ raw_spin_lock_irqsave(&desc->lock, flags);
desc = irq_to_desc(irq);
- if (!desc)
- goto out;
+ if (!desc) {
+ raw_spin_unlock_irqrestore(&desc->lock, flags);
+ return;
+ }

- raw_spin_lock_irqsave(&desc->lock, flags);
seq_printf(p, "%3d: ", irq);
for_each_online_cpu(cpu)
seq_printf(p, "%10u ", irq_desc_kstat_cpu(desc, cpu));
@@ -223,8 +224,6 @@ static void show_msi_interrupt(struct seq_file *p, int irq)

seq_putc(p, '\n');
raw_spin_unlock_irqrestore(&desc->lock, flags);
-out:
- irq_unlock_sparse();
}

/*
--
2.31.1

2022-04-22 20:29:48

by Pingfan Liu

[permalink] [raw]
Subject: [PATCH 1/9] irq/irqdesc: put the lock at the exact place in irq_sysfs_init()

sparse_irq_lock is used to protect irq_desc, and furthermore
irq_sysfs_init() is called only once, so put irq_lock_sparse() at the
exact place, where the race condition may start.

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
To: [email protected]
---
kernel/irq/irqdesc.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 939d21cd55c3..8d0982233277 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -312,15 +312,14 @@ static int __init irq_sysfs_init(void)
struct irq_desc *desc;
int irq;

- /* Prevent concurrent irq alloc/free */
- irq_lock_sparse();
-
irq_kobj_base = kobject_create_and_add("irq", kernel_kobj);
if (!irq_kobj_base) {
- irq_unlock_sparse();
return -ENOMEM;
}

+ /* Prevent concurrent irq alloc/free */
+ irq_lock_sparse();
+
/* Add the already allocated interrupts */
for_each_irq_desc(irq, desc)
irq_sysfs_add(irq, desc);
--
2.31.1

2022-04-22 20:32:52

by Pingfan Liu

[permalink] [raw]
Subject: [PATCH 2/9] irq/irqdesc: change the name of delete_irq_desc() to irq_delete_desc()

Make the naming convention consistent with irq_insert_desc(),
irq_to_desc().

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
To: [email protected]
---
kernel/irq/irqdesc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 8d0982233277..9feedaa08430 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -355,7 +355,7 @@ struct irq_desc *irq_to_desc(unsigned int irq)
EXPORT_SYMBOL_GPL(irq_to_desc);
#endif

-static void delete_irq_desc(unsigned int irq)
+static void irq_delete_desc(unsigned int irq)
{
radix_tree_delete(&irq_desc_tree, irq);
}
@@ -453,7 +453,7 @@ static void free_desc(unsigned int irq)
* irq_sysfs_init() as well.
*/
irq_sysfs_del(desc);
- delete_irq_desc(irq);
+ irq_delete_desc(irq);

/*
* We free the descriptor, masks and stat fields via RCU. That
--
2.31.1

2022-04-22 20:36:32

by Pingfan Liu

[permalink] [raw]
Subject: [PATCH 3/9] irq/manage: remove some unreferenced code

Neither remove_percpu_irq() nor for_each_irq_nr() has any users, so they
can be removed.

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Tom Rix <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Florian Fainelli <[email protected]>
Cc: Bartosz Golaszewski <[email protected]>
Cc: "Cédric Le Goater" <[email protected]>
Cc: [email protected]
To: [email protected]
---
.clang-format | 1 -
include/linux/irq.h | 1 -
include/linux/irqnr.h | 3 ---
kernel/irq/manage.c | 15 ---------------
4 files changed, 20 deletions(-)

diff --git a/.clang-format b/.clang-format
index fa959436bcfd..fb161bc8ca1a 100644
--- a/.clang-format
+++ b/.clang-format
@@ -198,7 +198,6 @@ ForEachMacros:
- 'for_each_if'
- 'for_each_iommu'
- 'for_each_ip_tunnel_rcu'
- - 'for_each_irq_nr'
- 'for_each_link_codecs'
- 'for_each_link_cpus'
- 'for_each_link_platforms'
diff --git a/include/linux/irq.h b/include/linux/irq.h
index f92788ccdba2..ec50007decfc 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -603,7 +603,6 @@ enum {

struct irqaction;
extern int setup_percpu_irq(unsigned int irq, struct irqaction *new);
-extern void remove_percpu_irq(unsigned int irq, struct irqaction *act);

#ifdef CONFIG_DEPRECATED_IRQ_CPU_ONOFFLINE
extern void irq_cpu_online(void);
diff --git a/include/linux/irqnr.h b/include/linux/irqnr.h
index 3496baa0b07f..082fe0856e87 100644
--- a/include/linux/irqnr.h
+++ b/include/linux/irqnr.h
@@ -28,7 +28,4 @@ unsigned int irq_get_next_irq(unsigned int offset);
for (irq = irq_get_next_irq(0); irq < nr_irqs; \
irq = irq_get_next_irq(irq + 1))

-#define for_each_irq_nr(irq) \
- for (irq = 0; irq < nr_irqs; irq++)
-
#endif
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index c03f71d5ec10..b7c86be68b58 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -2466,21 +2466,6 @@ static struct irqaction *__free_percpu_irq(unsigned int irq, void __percpu *dev_
return NULL;
}

-/**
- * remove_percpu_irq - free a per-cpu interrupt
- * @irq: Interrupt line to free
- * @act: irqaction for the interrupt
- *
- * Used to remove interrupts statically setup by the early boot process.
- */
-void remove_percpu_irq(unsigned int irq, struct irqaction *act)
-{
- struct irq_desc *desc = irq_to_desc(irq);
-
- if (desc && irq_settings_is_per_cpu_devid(desc))
- __free_percpu_irq(irq, act->percpu_dev_id);
-}
-
/**
* free_percpu_irq - free an interrupt allocated with request_percpu_irq
* @irq: Interrupt line to free
--
2.31.1

2022-04-22 20:42:18

by Pingfan Liu

[permalink] [raw]
Subject: [PATCH 6/9] pm/irq: make for_each_irq_desc() safe of irq_desc release

The invloved context is no a RCU read section. Furthermore there may be
more than one task at this point. Hence it demands a measure to prevent
irq_desc from freeing. Use irq_lock_sparse to serve the protection
purpose.

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
To: [email protected]
---
kernel/irq/pm.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/irq/pm.c b/kernel/irq/pm.c
index ca71123a6130..4b67a4c7de3c 100644
--- a/kernel/irq/pm.c
+++ b/kernel/irq/pm.c
@@ -133,6 +133,7 @@ void suspend_device_irqs(void)
struct irq_desc *desc;
int irq;

+ irq_lock_sparse();
for_each_irq_desc(irq, desc) {
unsigned long flags;
bool sync;
@@ -146,6 +147,7 @@ void suspend_device_irqs(void)
if (sync)
synchronize_irq(irq);
}
+ irq_unlock_sparse();
}
EXPORT_SYMBOL_GPL(suspend_device_irqs);

@@ -186,6 +188,7 @@ static void resume_irqs(bool want_early)
struct irq_desc *desc;
int irq;

+ /* The early resume stage is free of irq_desc release */
for_each_irq_desc(irq, desc) {
unsigned long flags;
bool is_early = desc->action &&
--
2.31.1

2022-04-22 20:47:19

by Pingfan Liu

[permalink] [raw]
Subject: [PATCH 5/9] x86/irq: place for_each_active_irq() in rcu read section

Since there are access to irq_desc, and no preemption is provided at the
involved, it requires rcu read lock to protect irq_desc.

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: "Maciej W. Rozycki" <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: [email protected]
To: [email protected]
---
arch/x86/kernel/apic/io_apic.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index c1bb384935b0..4bb16edcbe4d 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1333,6 +1333,7 @@ void __init print_IO_APICs(void)
print_IO_APIC(ioapic_idx);

printk(KERN_DEBUG "IRQ to pin mappings:\n");
+ rcu_read_lock();
for_each_active_irq(irq) {
struct irq_pin_list *entry;
struct irq_chip *chip;
@@ -1352,6 +1353,7 @@ void __init print_IO_APICs(void)
pr_cont("-> %d:%d", entry->apic, entry->pin);
pr_cont("\n");
}
+ rcu_read_unlock();

printk(KERN_INFO ".................................... done.\n");
}
@@ -2009,6 +2011,7 @@ static inline void init_IO_APIC_traps(void)
struct irq_cfg *cfg;
unsigned int irq;

+ /* The early boot stage is free of irq_desc release */
for_each_active_irq(irq) {
cfg = irq_cfg(irq);
if (IO_APIC_IRQ(irq) && cfg && !cfg->vector) {
--
2.31.1

2022-04-22 21:18:23

by Pingfan Liu

[permalink] [raw]
Subject: [PATCHv2] s390/irq: utilize RCU instead of irq_lock_sparse() in show_msi_interrupt()

As demonstrated by commit 74bdf7815dfb ("genirq: Speedup show_interrupts()"),
irq_desc can be accessed safely in RCU read section.

Hence here resorting to rcu read lock to get rid of irq_lock_sparse().

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: [email protected]
To: [email protected]

---
arch/s390/kernel/irq.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/s390/kernel/irq.c b/arch/s390/kernel/irq.c
index 3033f616e256..45393919fe61 100644
--- a/arch/s390/kernel/irq.c
+++ b/arch/s390/kernel/irq.c
@@ -205,7 +205,7 @@ static void show_msi_interrupt(struct seq_file *p, int irq)
unsigned long flags;
int cpu;

- irq_lock_sparse();
+ rcu_read_lock();
desc = irq_to_desc(irq);
if (!desc)
goto out;
@@ -224,7 +224,7 @@ static void show_msi_interrupt(struct seq_file *p, int irq)
seq_putc(p, '\n');
raw_spin_unlock_irqrestore(&desc->lock, flags);
out:
- irq_unlock_sparse();
+ rcu_read_unlock();
}

/*
--
2.31.1

2022-04-22 21:29:52

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 6/9] pm/irq: make for_each_irq_desc() safe of irq_desc release

On Wed, Apr 20, 2022 at 4:06 PM Pingfan Liu <[email protected]> wrote:
>
> The invloved context is no a RCU read section. Furthermore there may be
> more than one task at this point. Hence it demands a measure to prevent
> irq_desc from freeing. Use irq_lock_sparse to serve the protection
> purpose.

Can you please describe an example scenario in which the added locking
will prevent a failure from occurring?

> Signed-off-by: Pingfan Liu <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: "Rafael J. Wysocki" <[email protected]>
> To: [email protected]
> ---
> kernel/irq/pm.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/irq/pm.c b/kernel/irq/pm.c
> index ca71123a6130..4b67a4c7de3c 100644
> --- a/kernel/irq/pm.c
> +++ b/kernel/irq/pm.c
> @@ -133,6 +133,7 @@ void suspend_device_irqs(void)
> struct irq_desc *desc;
> int irq;
>
> + irq_lock_sparse();
> for_each_irq_desc(irq, desc) {
> unsigned long flags;
> bool sync;
> @@ -146,6 +147,7 @@ void suspend_device_irqs(void)
> if (sync)
> synchronize_irq(irq);
> }
> + irq_unlock_sparse();
> }
> EXPORT_SYMBOL_GPL(suspend_device_irqs);
>
> @@ -186,6 +188,7 @@ static void resume_irqs(bool want_early)
> struct irq_desc *desc;
> int irq;
>
> + /* The early resume stage is free of irq_desc release */
> for_each_irq_desc(irq, desc) {
> unsigned long flags;
> bool is_early = desc->action &&
> --
> 2.31.1
>

2022-04-22 21:46:52

by Miguel Ojeda

[permalink] [raw]
Subject: Re: [PATCH 3/9] irq/manage: remove some unreferenced code

On Wed, Apr 20, 2022 at 4:05 PM Pingfan Liu <[email protected]> wrote:
>
> Neither remove_percpu_irq() nor for_each_irq_nr() has any users, so they
> can be removed.

For `.clang-format`:

Acked-by: Miguel Ojeda <[email protected]>

Cheers,
Miguel

2022-04-22 22:40:40

by Pingfan Liu

[permalink] [raw]
Subject: [PATCH 8/9] irq: make irq_lock_sparse() independent of CONFIG_SPARSE_IRQ

Even in the case of non-sparse irq, the callsite of
for_each_active_irq() in irq_debugfs_init() still cares about the sync
of allocated_irqs bitmap. Otherwise irq_debugfs_init() may show some
disappeared irq if the irq is disactived by other driver in parallel.

As there are only a few callsites of irq_lock_sparse() in the cold path,
which means the slight performance drops can be ignored. Hence moving
irq_lock_sparse() out of CONFIG_SPARSE_IRQ to protect both irq_desc and
allocated_irqs bitmap, instead of making irq_lock_sparse() dependent on
GENERIC_IRQ_DEBUGFS.

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Wolfram Sang <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Mark Rutland <[email protected]>
To: [email protected]
---
include/linux/irqdesc.h | 6 ++----
kernel/irq/irqdesc.c | 20 ++++++++++----------
2 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index a77584593f7d..6c01231fec00 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -104,12 +104,10 @@ struct irq_desc {
const char *name;
} ____cacheline_internodealigned_in_smp;

-#ifdef CONFIG_SPARSE_IRQ
extern void irq_lock_sparse(void);
extern void irq_unlock_sparse(void);
-#else
-static inline void irq_lock_sparse(void) { }
-static inline void irq_unlock_sparse(void) { }
+
+#ifndef CONFIG_SPARSE_IRQ
extern struct irq_desc irq_desc[NR_IRQS];
#endif

diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 9feedaa08430..a5cefd7c9ef7 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -375,16 +375,6 @@ static void free_masks(struct irq_desc *desc)
static inline void free_masks(struct irq_desc *desc) { }
#endif

-void irq_lock_sparse(void)
-{
- mutex_lock(&sparse_irq_lock);
-}
-
-void irq_unlock_sparse(void)
-{
- mutex_unlock(&sparse_irq_lock);
-}
-
static struct irq_desc *alloc_desc(int irq, int node, unsigned int flags,
const struct cpumask *affinity,
struct module *owner)
@@ -721,6 +711,16 @@ int generic_handle_domain_nmi(struct irq_domain *domain, unsigned int hwirq)
}
#endif

+void irq_lock_sparse(void)
+{
+ mutex_lock(&sparse_irq_lock);
+}
+
+void irq_unlock_sparse(void)
+{
+ mutex_unlock(&sparse_irq_lock);
+}
+
/* Dynamic interrupt handling */

/**
--
2.31.1

2022-04-25 07:59:37

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCH 7/9] irq: remove needless lock in takedown_cpu()

On Thu, Apr 21, 2022 at 06:11:56PM +0200, Thomas Gleixner wrote:
> On Wed, Apr 20 2022 at 22:05, Pingfan Liu wrote:
>
> First of all, the subject prefix for the core interrupt subsystem is
> 'genirq' and the sentence after the colon starts with an uppercase
> letter. See:
>
> https://www.kernel.org/doc/html/latest/process/maintainer-tip.html
>
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index d0a9aa0b42e8..94a6b512c26d 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -1033,18 +1033,16 @@ static int takedown_cpu(unsigned int cpu)
> > kthread_park(st->thread);
> >
> > /*
> > - * Prevent irq alloc/free while the dying cpu reorganizes the
> > - * interrupt affinities.
> > + * RCU keeps watching 'cpu' until do_idle()->rcu_report_dead().
> > + * And cpu_stopper's fn is dispatched with preemption disabled.
> > + * So it can not occur to release a irq_desc.
> > */
> > - irq_lock_sparse();
>
> Not everything is about RCU here. You really need to look at all moving
> parts:
>
> irq_migrate_all_off_this_cpu() relies on the allocated_irqs bitmap and
> the sparse tree to be in consistent state, which is only guaranteed when
> the sparse lock is held.
>

For the irq which transfer from active to inactive(disappearing) after
fetching, desc->lock can serve the sync purpose. In this case,
irq_lock_sparse() is not needed. For a emergeing irq, I am not sure
about it.

> I'm not sure what you are trying to solve here. Not taking sparse_irq_lock
> here is not gaining anything.
>

It was a big lock preventing my original series to make kexec-reboot parallel on
arm64/riscv platform. But my new series takes a different way. And this
big lock is not a problem any longer.

Thanks for your time.

Regards,

Pingfan

2022-04-25 12:50:39

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 7/9] irq: remove needless lock in takedown_cpu()

On Mon, Apr 25 2022 at 10:57, Pingfan Liu wrote:
> On Thu, Apr 21, 2022 at 06:11:56PM +0200, Thomas Gleixner wrote:
>> > - irq_lock_sparse();
>>
>> Not everything is about RCU here. You really need to look at all moving
>> parts:
>>
>> irq_migrate_all_off_this_cpu() relies on the allocated_irqs bitmap and
>> the sparse tree to be in consistent state, which is only guaranteed when
>> the sparse lock is held.
>>
>
> For the irq which transfer from active to inactive(disappearing) after
> fetching, desc->lock can serve the sync purpose. In this case,
> irq_lock_sparse() is not needed. For a emergeing irq, I am not sure
> about it.

No, it's required for the free case. The alloc case is
uninteresting. Care to look into the code?

irq_free_descs()
lock(sparse);
free_descs();
bitmap_clear(allocated_irqs, from, cnt);
unlock_sparse);

As free_descs() sets the sparse tree entry to NULL, up to the point
where bitmap_clear() finishes the state is inconsistent.

Now look at irq_migrate_all_off_this_cpu() and figure out what happens
when stop_machine() hits into the inconsistent state.

This can be fixed, but not by making mysterious claims about RCU and
desc->lock.

Thanks,

tglx

2022-04-26 06:08:49

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCHv2] s390/irq: utilize RCU instead of irq_lock_sparse() in show_msi_interrupt()

On Fri, Apr 22, 2022 at 06:02:12PM +0800, Pingfan Liu wrote:
> As demonstrated by commit 74bdf7815dfb ("genirq: Speedup show_interrupts()"),
> irq_desc can be accessed safely in RCU read section.
>
> Hence here resorting to rcu read lock to get rid of irq_lock_sparse().
>
> Signed-off-by: Pingfan Liu <[email protected]>
> Cc: Heiko Carstens <[email protected]>
> Cc: Vasily Gorbik <[email protected]>
> Cc: Alexander Gordeev <[email protected]>
> Cc: Christian Borntraeger <[email protected]>
> Cc: Sven Schnelle <[email protected]>
> Cc: Sebastian Andrzej Siewior <[email protected]>
> Cc: [email protected]
> To: [email protected]
>
> ---
> arch/s390/kernel/irq.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)

Applied, thanks!

2022-04-27 09:50:52

by Pingfan Liu

[permalink] [raw]
Subject: [PATCHv2] genirq/PM: Make for_each_irq_desc() safe of irq_desc release

First, this is a suspicion of the code, not a really encountered bug.

*** The scenario ***

Two threads involved
threadA "hibernate" runs suspend_device_irqs()
threadB "rcu_cpu_kthread" runs rcu_core()->rcu_do_batch(), which releases
object, let's say irq_desc

Zoom in:
threadA threadB
for_each_irq_desc(irq, desc) {
get irq_descA which is under freeing
--->preempted by rcu_core()->rcu_do_batch() which releases irq_descA
raw_spin_lock_irqsave(&desc->lock, flags);
//Oops

And since in the involved code piece, suspend_device_irqs() runs in a
preemptible context, and there may be more than one thread at this
point. So the preemption can happen.

*** The fix ***

Since there is a blockable synchronize_irq() inside the code piece,
resorting to irq_lock_sparse() to protect the irq_desc from
disappearing.

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
To: [email protected]
---
v1 -> v2: improve commit log
kernel/irq/pm.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/irq/pm.c b/kernel/irq/pm.c
index ca71123a6130..4b67a4c7de3c 100644
--- a/kernel/irq/pm.c
+++ b/kernel/irq/pm.c
@@ -133,6 +133,7 @@ void suspend_device_irqs(void)
struct irq_desc *desc;
int irq;

+ irq_lock_sparse();
for_each_irq_desc(irq, desc) {
unsigned long flags;
bool sync;
@@ -146,6 +147,7 @@ void suspend_device_irqs(void)
if (sync)
synchronize_irq(irq);
}
+ irq_unlock_sparse();
}
EXPORT_SYMBOL_GPL(suspend_device_irqs);

@@ -186,6 +188,7 @@ static void resume_irqs(bool want_early)
struct irq_desc *desc;
int irq;

+ /* The early resume stage is free of irq_desc release */
for_each_irq_desc(irq, desc) {
unsigned long flags;
bool is_early = desc->action &&
--
2.31.1

2022-04-27 10:41:36

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCH 7/9] irq: remove needless lock in takedown_cpu()

On Mon, Apr 25, 2022 at 11:43:03AM +0200, Thomas Gleixner wrote:
> On Mon, Apr 25 2022 at 10:57, Pingfan Liu wrote:
> > On Thu, Apr 21, 2022 at 06:11:56PM +0200, Thomas Gleixner wrote:
> >> > - irq_lock_sparse();
> >>
> >> Not everything is about RCU here. You really need to look at all moving
> >> parts:
> >>
> >> irq_migrate_all_off_this_cpu() relies on the allocated_irqs bitmap and
> >> the sparse tree to be in consistent state, which is only guaranteed when
> >> the sparse lock is held.
> >>
> >
> > For the irq which transfer from active to inactive(disappearing) after
> > fetching, desc->lock can serve the sync purpose. In this case,
> > irq_lock_sparse() is not needed. For a emergeing irq, I am not sure
> > about it.
>
> No, it's required for the free case. The alloc case is
> uninteresting. Care to look into the code?
>

Yes, it is a good exercise. Thanks for the enlightenment.

> irq_free_descs()
> lock(sparse);
> free_descs();
> bitmap_clear(allocated_irqs, from, cnt);
> unlock_sparse);
>
> As free_descs() sets the sparse tree entry to NULL, up to the point
> where bitmap_clear() finishes the state is inconsistent.
>
> Now look at irq_migrate_all_off_this_cpu() and figure out what happens
> when stop_machine() hits into the inconsistent state.
>

So the following code should fix the inconsistence between bitmap and
sparse tree.
diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 1ed2b1739363..cd0d180f082d 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -161,6 +161,8 @@ void irq_migrate_all_off_this_cpu(void)
bool affinity_broken;

desc = irq_to_desc(irq);
+ if (!desc)
+ continue;
raw_spin_lock(&desc->lock);
affinity_broken = migrate_one_irq(desc);
raw_spin_unlock(&desc->lock);

> This can be fixed, but not by making mysterious claims about RCU and
> desc->lock.
>

But I still think that desc->lock is critical to the consistence of the
irq _affinity_ if removing sparse lock in takedown_cpu().

For the free case, after applying the above patch, it should work.
void irq_migrate_all_off_this_cpu(void)
{
for_each_active_irq(irq) {

desc = irq_to_desc(irq);
if (!desc)
continue;
---> if breaking
in by free, then
migrate_one_irq()
will skip it
since the irq is
not activated any
long
raw_spin_lock(&desc->lock);
affinity_broken = migrate_one_irq(desc);
raw_spin_unlock(&desc->lock);
...
}
}

But for the alloc case, it could be a problem.
void irq_migrate_all_off_this_cpu(void)
{
for_each_active_irq(irq) {

desc = irq_to_desc(irq);
if (!desc)
continue;
raw_spin_lock(&desc->lock);
affinity_broken = migrate_one_irq(desc);
raw_spin_unlock(&desc->lock);
...
---> any new irq will
not be detected. But alloc_descs(start, cnt, node, affinity)
still associate the
irq with this cpu.
There is _no_
opportunity to clear
out this cpu from
desc->irq_common_data.affinity.

This is the affinity
inconsistent problem.
}


Thanks,

Pingfan