Since there are irq number allocation holes, we can jump over those
holes in order to speed up show_interrupts().
In addition, the percpu kstat_irqs access logic can be refined.
System Configuration
====================
* 2-socket server with 488 cores (HT-enabled).
* The last allocated irq is 508.
* nr_irqs = 8360. The following is from dmesg.
NR_IRQS: 524544, nr_irqs: 8360, preallocated irqs: 16
The biggest hole: 7852 iterations (8360 - 509 + 1) are not necessary.
Test Result
===========
* The following result is the average execution time of ten-time
measurements about `time cat /proc/interrupts`.
no patch (ms) patched (ms) saved
------------- ------------ -------
52.4 47.3 9.7%
Adrian Huang (2):
genirq/proc: Try to jump over the unallocated irq hole whenever
possible
genirq/proc: Refine percpu kstat_irqs access logic
fs/proc/interrupts.c | 6 ++++++
kernel/irq/proc.c | 26 ++++++++++++++++++--------
2 files changed, 24 insertions(+), 8 deletions(-)
--
2.25.1
From: Adrian Huang <[email protected]>
Current approach blindly iterates the irq number until the number is
greater than 'nr_irqs', and checks if each irq is allocated. It is
inefficient if the big hole (the last allocated irq number to nr_irqs)
is available in the system.
The solution is to try jumping over the unallocated irq hole when an
unallocated irq is detected.
Tested-by: Jiwei Sun <[email protected]>
Signed-off-by: Adrian Huang <[email protected]>
---
fs/proc/interrupts.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/fs/proc/interrupts.c b/fs/proc/interrupts.c
index cb0edc7cbf09..111ea8a3c553 100644
--- a/fs/proc/interrupts.c
+++ b/fs/proc/interrupts.c
@@ -19,6 +19,12 @@ static void *int_seq_next(struct seq_file *f, void *v, loff_t *pos)
(*pos)++;
if (*pos > nr_irqs)
return NULL;
+
+ rcu_read_lock();
+ if (!irq_to_desc(*pos))
+ *pos = irq_get_next_irq(*pos);
+ rcu_read_unlock();
+
return pos;
}
--
2.25.1
From: Adrian Huang <[email protected]>
There is no need to accumulate all CPUs' kstat_irqs to determine whether
the corresponding irq should be printed. Instead, stop the iteration
once one of kstat_irqs is nonzero.
In addition, no need to check if kstat_irqs address is available
for each iteration when printing each CPU irq statistic.
Tested-by: Jiwei Sun <[email protected]>
Signed-off-by: Adrian Huang <[email protected]>
---
kernel/irq/proc.c | 26 ++++++++++++++++++--------
1 file changed, 18 insertions(+), 8 deletions(-)
diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
index 623b8136e9af..bfa341fac687 100644
--- a/kernel/irq/proc.c
+++ b/kernel/irq/proc.c
@@ -461,7 +461,7 @@ int show_interrupts(struct seq_file *p, void *v)
{
static int prec;
- unsigned long flags, any_count = 0;
+ unsigned long flags, print_irq = 1;
int i = *(loff_t *) v, j;
struct irqaction *action;
struct irq_desc *desc;
@@ -488,18 +488,28 @@ int show_interrupts(struct seq_file *p, void *v)
if (!desc || irq_settings_is_hidden(desc))
goto outsparse;
- if (desc->kstat_irqs) {
- for_each_online_cpu(j)
- any_count |= data_race(*per_cpu_ptr(desc->kstat_irqs, j));
+ if ((!desc->action || irq_desc_is_chained(desc)) && desc->kstat_irqs) {
+ print_irq = 0;
+ for_each_online_cpu(j) {
+ if (data_race(*per_cpu_ptr(desc->kstat_irqs, j))) {
+ print_irq = 1;
+ break;
+ }
+ }
}
- if ((!desc->action || irq_desc_is_chained(desc)) && !any_count)
+ if (!print_irq)
goto outsparse;
seq_printf(p, "%*d: ", prec, i);
- for_each_online_cpu(j)
- seq_printf(p, "%10u ", desc->kstat_irqs ?
- *per_cpu_ptr(desc->kstat_irqs, j) : 0);
+
+ if (desc->kstat_irqs) {
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ", *per_cpu_ptr(desc->kstat_irqs, j));
+ } else {
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ", 0);
+ }
raw_spin_lock_irqsave(&desc->lock, flags);
if (desc->irq_data.chip) {
--
2.25.1
On Mon, May 13 2024 at 20:05, Adrian Huang wrote:
> @@ -461,7 +461,7 @@ int show_interrupts(struct seq_file *p, void *v)
> {
> static int prec;
>
> - unsigned long flags, any_count = 0;
> + unsigned long flags, print_irq = 1;
What's wrong with making print_irq boolean?
> int i = *(loff_t *) v, j;
> struct irqaction *action;
> struct irq_desc *desc;
> @@ -488,18 +488,28 @@ int show_interrupts(struct seq_file *p, void *v)
> if (!desc || irq_settings_is_hidden(desc))
> goto outsparse;
>
> - if (desc->kstat_irqs) {
> - for_each_online_cpu(j)
> - any_count |= data_race(*per_cpu_ptr(desc->kstat_irqs, j));
> + if ((!desc->action || irq_desc_is_chained(desc)) && desc->kstat_irqs) {
The condition is wrong. Look how the old code evaluated any_count.
> + print_irq = 0;
> + for_each_online_cpu(j) {
> + if (data_race(*per_cpu_ptr(desc->kstat_irqs, j))) {
> + print_irq = 1;
> + break;
> + }
> + }
Aside of that this code is just fundamentally wrong in several aspects:
1) Interrupts which have no action are completely uninteresting as
there is no real information attached, i.e. it shows that there
were interrupts on some CPUs, but there is zero information from
which device they originated.
Especially with sparse interrupts enabled they are usually gone
shortly after the last action was removed.
2) Chained interrupts do not have a count at all as they completely
evade the core kernel entry points.
So all of this can be avoided and the whole nonsense can be reduced to:
if (!desc->action || irq_desc_is_chained(desc) || !desc->kstat_irqs)
goto outsparse;
which in turn allows to convert this:
> - for_each_online_cpu(j)
> - seq_printf(p, "%10u ", desc->kstat_irqs ?
> - *per_cpu_ptr(desc->kstat_irqs, j) : 0);
into an unconditional:
for_each_online_cpu(j)
seq_printf(p, "%10u ", *per_cpu_ptr(desc->kstat_irqs, j));
Thanks,
tglx