From: Adrian Huang <[email protected]>
Current approach blindly iterates the irq number until the number is
greater than 'nr_irqs', and checks if each irq is allocated.
Here is an example:
* 2-socket server with 488 cores (HT-enabled).
* The last allocated irq is 508. [1]
* nr_irqs = 8360. The following is from dmesg.
NR_IRQS: 524544, nr_irqs: 8360, preallocated irqs: 16
7852 iterations (8360 - 509 + 1) are not necessary. And, there
are some irq unallocated holes from irq 0-508. [1]
The solution is to try jumping over the unallocated irq hole when an
unallocated irq is detected.
Test Result
-----------
* The following ftrace log makes sure that this patch jumps over the
unallocated irq hole (less seq_read_iter() works).
** ftrace w/ patch:
| seq_read_iter() {
+---2230 lines: 0.791 us | mutex_lock();------------
| seq_read_iter() {
0.621 us | mutex_lock();
0.391 us | int_seq_start();
0.411 us | int_seq_stop();
0.391 us | mutex_unlock();
3.916 us | }
** ftrace wo/ patch:
| seq_read_iter() {
+--17955 lines: 0.722 us | mutex_lock();------------
| seq_read_iter() {
0.621 us | mutex_lock();
0.400 us | int_seq_start();
0.380 us | int_seq_stop();
0.381 us | mutex_unlock();
3.946 us | }
* The following result is the average execution time of five-time
measurements about seq_read_iter().
no patch (us) patched (us) saved
------------- ------------ -------
158552 148094 6.6%
[1] https://gist.github.com/AdrianHuang/6c60b8053b2b3ecf6da56dec7a0eae70
Tested-by: Jiwei Sun <[email protected]>
Signed-off-by: Adrian Huang <[email protected]>
---
kernel/irq/proc.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
index 623b8136e9af..756bdc1fd07b 100644
--- a/kernel/irq/proc.c
+++ b/kernel/irq/proc.c
@@ -485,7 +485,14 @@ int show_interrupts(struct seq_file *p, void *v)
rcu_read_lock();
desc = irq_to_desc(i);
- if (!desc || irq_settings_is_hidden(desc))
+ if (!desc) {
+ /* Try to jump over the unallocated irq hole. */
+ *(int *) v = irq_get_next_irq(i + 1) - 1;
+
+ goto outsparse;
+ }
+
+ if (irq_settings_is_hidden(desc))
goto outsparse;
if (desc->kstat_irqs) {
--
2.25.1
On Mon, Mar 25 2024 at 11:51, Adrian Huang wrote:
> * The following result is the average execution time of five-time
> measurements about seq_read_iter().
>
> no patch (us) patched (us) saved
> ------------- ------------ -------
> 158552 148094 6.6%
>
> [1] https://gist.github.com/AdrianHuang/6c60b8053b2b3ecf6da56dec7a0eae70
These gist links are useless within no time and not of real value. We
all know that there are holes.
> Tested-by: Jiwei Sun <[email protected]>
> Signed-off-by: Adrian Huang <[email protected]>
> ---
> kernel/irq/proc.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
> index 623b8136e9af..756bdc1fd07b 100644
> --- a/kernel/irq/proc.c
> +++ b/kernel/irq/proc.c
> @@ -485,7 +485,14 @@ int show_interrupts(struct seq_file *p, void *v)
>
> rcu_read_lock();
> desc = irq_to_desc(i);
> - if (!desc || irq_settings_is_hidden(desc))
> + if (!desc) {
> + /* Try to jump over the unallocated irq hole. */
> + *(int *) v = irq_get_next_irq(i + 1) - 1;
The position is loff_t and not int. See the type cast at the beginning
of that function. But let's leave this detail aside and let me ask the
real question:
Why are you not handling it at the place where the seq_file position is
iterated instead of doing this + 1 - 1 game here and getting the next
number only when you already ran into the hole?
That's the obvious place, no?
Thanks,
tglx
On Tue, Mar 26, 2024 at 6:32 AM Thomas Gleixner <[email protected]> wrote:
>
> Why are you not handling it at the place where the seq_file position is
> iterated instead of doing this + 1 - 1 game here and getting the next
> number only when you already ran into the hole?
Thanks for the comments.
Yes, I did think about the changes in int_seq_next().
The reason I made the changes in show_interrupts() is to minimize the
traversal times of the 'sparse_irqs' maple tree since irq_to_desc() is
invoked in show_interrupts().
Does the following patch meet your expectations? If yes, I'll submit
it for review.
Thanks.
---
diff --git a/fs/proc/interrupts.c b/fs/proc/interrupts.c
index cb0edc7cb..ac051a9b8 100644
--- a/fs/proc/interrupts.c
+++ b/fs/proc/interrupts.c
@@ -19,6 +19,10 @@ static void *int_seq_next(struct seq_file *f, void
*v, loff_t *pos)
(*pos)++;
if (*pos > nr_irqs)
return NULL;
+
+ if (!irq_to_desc(*pos))
+ *pos = irq_get_next_irq(*pos);
+
return pos;
}
On Tue, Mar 26 2024 at 21:35, Huang Adrian wrote:
> On Tue, Mar 26, 2024 at 6:32 AM Thomas Gleixner <[email protected]> wrote:
> The reason I made the changes in show_interrupts() is to minimize the
> traversal times of the 'sparse_irqs' maple tree since irq_to_desc() is
> invoked in show_interrupts().
They are not really relevant. The cache line is hot after the
irq_get_next_irq() lookup and maple tree is a highly optimized data
structure.
I'm not saying that it is free, but if you want to avoid that in the
first place then you need to do a lookup of the next descriptor and hand
it into show_interrupts() right away and not just hack some completely
obscure optimization into show_interrupts().
> @@ -19,6 +19,10 @@ static void *int_seq_next(struct seq_file *f, void
> *v, loff_t *pos)
[ 3 more citation lines. Click/Enter to show. ]
> (*pos)++;
> if (*pos > nr_irqs)
> return NULL;
> +
> + if (!irq_to_desc(*pos))
> + *pos = irq_get_next_irq(*pos);
How is irq_get_next_irq() valid without either holding the sparse irq
lock or RCU read lock?
Testing this with full debug enabled might give you the answer to that
question.
But that aside you are missing a massive performance problem in the
generic show_interrupts() code:
if (desc->kstat_irqs) {
for_each_online_cpu(j)
any_count |= data_race(*per_cpu_ptr(desc->kstat_irqs, j));
}
if ((!desc->action || irq_desc_is_chained(desc)) && !any_count)
goto outsparse;
There are two obvious problems with that, no?
Thanks,
tglx
On Wed, Mar 27, 2024 at 7:36 AM Thomas Gleixner <[email protected]> wrote:
>
> > @@ -19,6 +19,10 @@ static void *int_seq_next(struct seq_file *f, void
> > *v, loff_t *pos)
> [ 3 more citation lines. Click/Enter to show. ]
> > (*pos)++;
> > if (*pos > nr_irqs)
> > return NULL;
> > +
> > + if (!irq_to_desc(*pos))
> > + *pos = irq_get_next_irq(*pos);
>
> How is irq_get_next_irq() valid without either holding the sparse irq
> lock or RCU read lock?
Indeed, thanks for pointing this out. How about the patch below?
---
diff --git a/fs/proc/interrupts.c b/fs/proc/interrupts.c
index cb0edc7cb..111ea8a3c 100644
--- a/fs/proc/interrupts.c
+++ b/fs/proc/interrupts.c
@@ -19,6 +19,12 @@ static void *int_seq_next(struct seq_file *f, void
*v, loff_t *pos)
(*pos)++;
if (*pos > nr_irqs)
return NULL;
+
+ rcu_read_lock();
+ if (!irq_to_desc(*pos))
+ *pos = irq_get_next_irq(*pos);
+ rcu_read_unlock();
+
return pos;
}
> But that aside you are missing a massive performance problem in the
> generic show_interrupts() code:
>
> if (desc->kstat_irqs) {
> for_each_online_cpu(j)
> any_count |= data_race(*per_cpu_ptr(desc->kstat_irqs, j));
> }
>
> if ((!desc->action || irq_desc_is_chained(desc)) && !any_count)
> goto outsparse;
>
Yes, good point. The change is shown below.
BTW, I also made the change for iterating each cpu's irq stat. (No
need to check the percpu 'kstat_irqs' pointer for each iteration).
If you're ok, I'll send out the series.
---
diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
index 623b8136e..bfa341fac 100644
--- a/kernel/irq/proc.c
+++ b/kernel/irq/proc.c
@@ -461,7 +461,7 @@ int show_interrupts(struct seq_file *p, void *v)
{
static int prec;
- unsigned long flags, any_count = 0;
+ unsigned long flags, print_irq = 1;
int i = *(loff_t *) v, j;
struct irqaction *action;
struct irq_desc *desc;
@@ -488,18 +488,28 @@ int show_interrupts(struct seq_file *p, void *v)
if (!desc || irq_settings_is_hidden(desc))
goto outsparse;
- if (desc->kstat_irqs) {
- for_each_online_cpu(j)
- any_count |=
data_race(*per_cpu_ptr(desc->kstat_irqs, j));
+ if ((!desc->action || irq_desc_is_chained(desc)) && desc->kstat_irqs) {
+ print_irq = 0;
+ for_each_online_cpu(j) {
+ if (data_race(*per_cpu_ptr(desc->kstat_irqs, j))) {
+ print_irq = 1;
+ break;
+ }
+ }
}
- if ((!desc->action || irq_desc_is_chained(desc)) && !any_count)
+ if (!print_irq)
goto outsparse;
seq_printf(p, "%*d: ", prec, i);
- for_each_online_cpu(j)
- seq_printf(p, "%10u ", desc->kstat_irqs ?
- *per_cpu_ptr(desc->kstat_irqs, j) : 0);
+
+ if (desc->kstat_irqs) {
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ",
data_race(*per_cpu_ptr(desc->kstat_irqs, j)));
+ } else {
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ", 0);
+ }
raw_spin_lock_irqsave(&desc->lock, flags);
if (desc->irq_data.chip) {
-- Adrian