2022-09-30 09:08:32

by chenlifu

[permalink] [raw]
Subject: [PATCH -next] genirq: Add SPARSE_NR_IRQS Kconfig option

On a large-scale multi-core and NUMA platform, more than 1024 cores and
16 NUMA nodes for example, even if SPASE_IRQ is selected to increase the
number of interrupt numbers by 8196 base on NR_IRQS, the interrupt numbers
requirement cannot be met. Therefore, make the number of sparse interrupt
numbers configurable.

Signed-off-by: Chen Lifu <[email protected]>
---
kernel/irq/Kconfig | 8 ++++++++
kernel/irq/internals.h | 2 +-
2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index db3d174c53d4..b517b820e329 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -123,10 +123,18 @@ config SPARSE_IRQ
( Sparse irqs can also be beneficial on NUMA boxes, as they spread
out the interrupt descriptors in a more NUMA-friendly way. )

If you don't know what to do here, say N.

+config SPARSE_NR_IRQS
+ int "Number of sparse interrupt numbers"
+ depends on SPARSE_IRQ
+ default 8196
+ help
+ This defines the maximum number of interrupt numbers
+ that can be dynamically expanded.
+
config GENERIC_IRQ_DEBUGFS
bool "Expose irq internals in debugfs"
depends on DEBUG_FS
select GENERIC_IRQ_INJECTION
default n
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index f09c60393e55..ab8ac93c60e6 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -10,11 +10,11 @@
#include <linux/kernel_stat.h>
#include <linux/pm_runtime.h>
#include <linux/sched/clock.h>

#ifdef CONFIG_SPARSE_IRQ
-# define IRQ_BITMAP_BITS (NR_IRQS + 8196)
+# define IRQ_BITMAP_BITS (NR_IRQS + CONFIG_SPARSE_NR_IRQS)
#else
# define IRQ_BITMAP_BITS NR_IRQS
#endif

#define istate core_internal_state__do_not_mess_with_it
--
2.37.1


2022-10-18 02:10:45

by chenlifu

[permalink] [raw]
Subject: Re: [PATCH -next] genirq: Add SPARSE_NR_IRQS Kconfig option

在 2022/9/30 16:58, Chen Lifu 写道:
> On a large-scale multi-core and NUMA platform, more than 1024 cores and
> 16 NUMA nodes for example, even if SPASE_IRQ is selected to increase the
> number of interrupt numbers by 8196 base on NR_IRQS, the interrupt numbers
> requirement cannot be met. Therefore, make the number of sparse interrupt
> numbers configurable.
>
> Signed-off-by: Chen Lifu <[email protected]>
> ---
> kernel/irq/Kconfig | 8 ++++++++
> kernel/irq/internals.h | 2 +-
> 2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
> index db3d174c53d4..b517b820e329 100644
> --- a/kernel/irq/Kconfig
> +++ b/kernel/irq/Kconfig
> @@ -123,10 +123,18 @@ config SPARSE_IRQ
> ( Sparse irqs can also be beneficial on NUMA boxes, as they spread
> out the interrupt descriptors in a more NUMA-friendly way. )
>
> If you don't know what to do here, say N.
>
> +config SPARSE_NR_IRQS
> + int "Number of sparse interrupt numbers"
> + depends on SPARSE_IRQ
> + default 8196
> + help
> + This defines the maximum number of interrupt numbers
> + that can be dynamically expanded.
> +
> config GENERIC_IRQ_DEBUGFS
> bool "Expose irq internals in debugfs"
> depends on DEBUG_FS
> select GENERIC_IRQ_INJECTION
> default n
> diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
> index f09c60393e55..ab8ac93c60e6 100644
> --- a/kernel/irq/internals.h
> +++ b/kernel/irq/internals.h
> @@ -10,11 +10,11 @@
> #include <linux/kernel_stat.h>
> #include <linux/pm_runtime.h>
> #include <linux/sched/clock.h>
>
> #ifdef CONFIG_SPARSE_IRQ
> -# define IRQ_BITMAP_BITS (NR_IRQS + 8196)
> +# define IRQ_BITMAP_BITS (NR_IRQS + CONFIG_SPARSE_NR_IRQS)
> #else
> # define IRQ_BITMAP_BITS NR_IRQS
> #endif
>
> #define istate core_internal_state__do_not_mess_with_it

Friendly ping ...

2022-10-31 00:53:39

by chenlifu

[permalink] [raw]
Subject: Re: [PATCH -next] genirq: Add SPARSE_NR_IRQS Kconfig option

在 2022/10/18 9:22, chenlifu 写道:
> 在 2022/9/30 16:58, Chen Lifu 写道:
>> On a large-scale multi-core and NUMA platform, more than 1024 cores and
>> 16 NUMA nodes for example, even if SPASE_IRQ is selected to increase the
>> number of interrupt numbers by 8196 base on NR_IRQS, the interrupt
>> numbers
>> requirement cannot be met. Therefore, make the number of sparse interrupt
>> numbers configurable.
>>
>> Signed-off-by: Chen Lifu <[email protected]>
>> ---
>>   kernel/irq/Kconfig     | 8 ++++++++
>>   kernel/irq/internals.h | 2 +-
>>   2 files changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
>> index db3d174c53d4..b517b820e329 100644
>> --- a/kernel/irq/Kconfig
>> +++ b/kernel/irq/Kconfig
>> @@ -123,10 +123,18 @@ config SPARSE_IRQ
>>         ( Sparse irqs can also be beneficial on NUMA boxes, as they
>> spread
>>           out the interrupt descriptors in a more NUMA-friendly way. )
>>         If you don't know what to do here, say N.
>> +config SPARSE_NR_IRQS
>> +    int "Number of sparse interrupt numbers"
>> +    depends on SPARSE_IRQ
>> +    default 8196
>> +    help
>> +      This defines the maximum number of interrupt numbers
>> +      that can be dynamically expanded.
>> +
>>   config GENERIC_IRQ_DEBUGFS
>>       bool "Expose irq internals in debugfs"
>>       depends on DEBUG_FS
>>       select GENERIC_IRQ_INJECTION
>>       default n
>> diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
>> index f09c60393e55..ab8ac93c60e6 100644
>> --- a/kernel/irq/internals.h
>> +++ b/kernel/irq/internals.h
>> @@ -10,11 +10,11 @@
>>   #include <linux/kernel_stat.h>
>>   #include <linux/pm_runtime.h>
>>   #include <linux/sched/clock.h>
>>   #ifdef CONFIG_SPARSE_IRQ
>> -# define IRQ_BITMAP_BITS    (NR_IRQS + 8196)
>> +# define IRQ_BITMAP_BITS    (NR_IRQS + CONFIG_SPARSE_NR_IRQS)
>>   #else
>>   # define IRQ_BITMAP_BITS    NR_IRQS
>>   #endif
>>   #define istate core_internal_state__do_not_mess_with_it
>
> Friendly ping ...

Friendly ping ...

2022-11-10 04:10:54

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH -next] genirq: Add SPARSE_NR_IRQS Kconfig option

On Fri, Sep 30 2022 at 16:58, Chen Lifu wrote:
> On a large-scale multi-core and NUMA platform, more than 1024 cores and
> 16 NUMA nodes for example, even if SPASE_IRQ is selected to increase the
> number of interrupt numbers by 8196 base on NR_IRQS, the interrupt numbers
> requirement cannot be met. Therefore, make the number of sparse interrupt
> numbers configurable.

Why?

Let me walk you through:

# git grep '#define\sNR_IRQS' arch/x86
arch/x86/include/asm/irq_vectors.h:#define NR_IRQS_LEGACY 16
arch/x86/include/asm/irq_vectors.h:#define NR_IRQS \
arch/x86/include/asm/irq_vectors.h:#define NR_IRQS (NR_VECTORS + IO_APIC_VECTOR_LIMIT)
arch/x86/include/asm/irq_vectors.h:#define NR_IRQS (NR_VECTORS + CPU_VECTOR_LIMIT)
arch/x86/include/asm/irq_vectors.h:#define NR_IRQS NR_IRQS_LEGACY

Versus:

# git grep '#define\sNR_IRQS' arch/arm64

Empty. Oooops. Where does it get the define from on which the define you
try to influence with your config knob depends on, i.e.

# define IRQ_BITMAP_BITS (NR_IRQS + 8196)

Good question, right?

But not rocket science to answer. If there is no architecture specific
definition then there is a close to 100% probability that there is a
generic fallback define in include/asm-generic/. And indeed

# git grep '#define\sNR_IRQS' include/

include/asm-generic/irq.h:#define NR_IRQS 64

So let's do the math:

(64 + 8196) / 1024 ~= 8

Unsurprisingly enough this does barely cope for the per CPU requirements
of an ARM64 system. Of course if you add a proper amount of periphery
then you surely run out of interrupt numbers....

Now let me go back to the grep I did on x86. That matches on some lines
w/o context, but let me show you the full context of the relevant
configuration of a enterprisy machine with all bells and whistels here
for illustration:

#define NR_VECTORS 256
#define MAX_IO_APICS 128
#define CPU_VECTOR_LIMIT (64 * NR_CPUS)
#define IO_APIC_VECTOR_LIMIT (32 * MAX_IO_APICS)

So in the case of a NR_CPUS=1024 configuration this evaluates to:

CPU_VECTOR_LIMIT 64*1024 = 65536
IO_APIC_VECTOR_LIMIT 32*128 = 4096

and then the relevant NR_IRQS define:

#define NR_IRQS \
(CPU_VECTOR_LIMIT > IO_APIC_VECTOR_LIMIT ? \
(NR_VECTORS + CPU_VECTOR_LIMIT) : \
(NR_VECTORS + IO_APIC_VECTOR_LIMIT))

evaluates to:

(65536 > 4096) ? (256 * 65536) : (256 * 4096) = 256 * 65536 = ....

While the resulting number is silly large, you should be able to figure
out the fundamental difference of the approach to limit the number of
interrupts between the x86 and the arm64 case, right?

There is no good reason to copy the insanely large numbers of x86 but
you surely can figure out how the key component in that calculation
which takes NR_CPUs into account avoids another random uncomprehensible
configuration knob:

# define IRQ_BITMAP_BITS (NR_IRQS + 8196)

No?

Thanks,

tglx