2008-03-26 21:11:41

by Alan Mayer

[permalink] [raw]
Subject: [PATCH] x86_64: resize NR_IRQS for large machines (re-submit)


Subject: [PATCH] x86_64: resize NR_IRQS for large machines

From: Alan Mayer <[email protected]>

On machines with very large numbers of cpus, tables that are dimensioned
by NR_IRQS get very large, especially the irq_desc table. They are also
very sparsely used. When the cpu count is > MAX_IO_APICS, use MAX_IO_APICS
to set NR_IRQS, otherwise use NR_CPUS.

Signed-off-by: Alan Mayer <[email protected]>

Reviewed-by: Christoph Lameter <[email protected]>

---

===================================================================
--- v2.6.25-rc6.orig/include/asm-x86/irq_64.h 2008-03-19 16:52:52.000000000 -0500
+++ v2.6.25-rc6/include/asm-x86/irq_64.h 2008-03-26 14:02:32.000000000 -0500
@@ -10,6 +10,8 @@
* <[email protected]>
*/

+#include <asm/apicdef.h>
+
#define TIMER_IRQ 0

/*
@@ -31,7 +33,11 @@

#define FIRST_SYSTEM_VECTOR 0xef /* duplicated in hw_irq.h */

-#define NR_IRQS (NR_VECTORS + (32 *NR_CPUS))
+#if NR_CPUS < MAX_IO_APICS
+#define NR_IRQS (NR_VECTORS + (32 * NR_CPUS))
+#else
+#define NR_IRQS (NR_VECTORS + (32 * MAX_IO_APICS))
+#endif
#define NR_IRQ_VECTORS NR_IRQS

static __inline__ int irq_canonicalize(int irq)
Index: v2.6.25-rc6/include/linux/kernel_stat.h
===================================================================
--- v2.6.25-rc6.orig/include/linux/kernel_stat.h 2008-03-19 16:53:00.000000000 -0500
+++ v2.6.25-rc6/include/linux/kernel_stat.h 2008-03-20 11:12:27.000000000 -0500
@@ -1,11 +1,11 @@
#ifndef _LINUX_KERNEL_STAT_H
#define _LINUX_KERNEL_STAT_H

-#include <asm/irq.h>
#include <linux/smp.h>
#include <linux/threads.h>
#include <linux/percpu.h>
#include <linux/cpumask.h>
+#include <asm/irq.h>
#include <asm/cputime.h>

/*


2008-03-26 21:24:53

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86_64: resize NR_IRQS for large machines (re-submit)


* Alan Mayer <[email protected]> wrote:

> On machines with very large numbers of cpus, tables that are
> dimensioned by NR_IRQS get very large, especially the irq_desc table.
> They are also very sparsely used. When the cpu count is >
> MAX_IO_APICS, use MAX_IO_APICS to set NR_IRQS, otherwise use NR_CPUS.

thanks Alan, applied this in place of the other patch.

this bit is still ugly:

> -#define NR_IRQS (NR_VECTORS + (32 *NR_CPUS))
> +#if NR_CPUS < MAX_IO_APICS
> +#define NR_IRQS (NR_VECTORS + (32 * NR_CPUS))
> +#else
> +#define NR_IRQS (NR_VECTORS + (32 * MAX_IO_APICS))
> +#endif
> #define NR_IRQ_VECTORS NR_IRQS

but it doesnt really depart from the current status quo of huge
[NR_IRQS] arrays either. Patches that make NR_IRQS a variable are
welcome :)

Ingo

2008-03-26 21:40:35

by Alan Mayer

[permalink] [raw]
Subject: Re: [PATCH] x86_64: resize NR_IRQS for large machines (re-submit)

On Wed, 26 Mar 2008, Ingo Molnar wrote:

>
> * Alan Mayer <[email protected]> wrote:
>
> > On machines with very large numbers of cpus, tables that are
> > dimensioned by NR_IRQS get very large, especially the irq_desc table.
> > They are also very sparsely used. When the cpu count is >
> > MAX_IO_APICS, use MAX_IO_APICS to set NR_IRQS, otherwise use NR_CPUS.
>
> thanks Alan, applied this in place of the other patch.
>
> this bit is still ugly:
>
> > -#define NR_IRQS (NR_VECTORS + (32 *NR_CPUS))
> > +#if NR_CPUS < MAX_IO_APICS
> > +#define NR_IRQS (NR_VECTORS + (32 * NR_CPUS))
> > +#else
> > +#define NR_IRQS (NR_VECTORS + (32 * MAX_IO_APICS))
> > +#endif
> > #define NR_IRQ_VECTORS NR_IRQS
>
> but it doesnt really depart from the current status quo of huge
> [NR_IRQS] arrays either. Patches that make NR_IRQS a variable are
> welcome :)
>
> Ingo
>

Good luck with that. If i come up with something that's elegant enough
to make it worth the risk, I'll let you know. Changing NR_IRQS to a variable
touches every arch and a lot of drivers. Someone is bound to choke
on it, so it has to be something worth fighting for.

--ajm

Lately it occurs to me,
What a long, strange trip it's been.
--
Alan J. Mayer
SGI
[email protected]
WORK: 651-683-3131
HOME: 651-407-0134
--

2008-03-26 22:25:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86_64: resize NR_IRQS for large machines (re-submit)


* Alan Mayer <[email protected]> wrote:

> > but it doesnt really depart from the current status quo of huge
> > [NR_IRQS] arrays either. Patches that make NR_IRQS a variable are
> > welcome :)
>
> Good luck with that. If i come up with something that's elegant
> enough to make it worth the risk, I'll let you know. Changing NR_IRQS
> to a variable touches every arch and a lot of drivers. Someone is
> bound to choke on it, so it has to be something worth fighting for.

well, i dont it has to be (or it should be) an all or nothing patch,
given the complexity and risks involved.

- we should first introduce a nr_irqs variable and a Kconfig switch
(say CONFIG_ARCH_HAS_DYNAMIC_NR_IRQS) for architectures to toggle. If
the switch is toggled, nr_irqs is a variable, otherwise it's a carbon
copy of NR_IRQS. Some array-definition, declaration and initialization
wrappers are provided as well.

- then the core code, x86 and most drivers can be converted to nr_irqs.
The switch might initially even be user-selectable if
CONFIG_DEBUG_KERNEL, to ease regression testing.

- other architectures will follow one by one, fixing their
arch-dependent drivers as well in the process

- finally we get rid of the wrappers.

Ingo

2008-03-27 16:16:19

by Alan Mayer

[permalink] [raw]
Subject: Re: [PATCH] x86_64: resize NR_IRQS for large machines (re-submit)

> well, i dont it has to be (or it should be) an all or nothing patch,
> given the complexity and risks involved.
>
> - we should first introduce a nr_irqs variable and a Kconfig switch
> (say CONFIG_ARCH_HAS_DYNAMIC_NR_IRQS) for architectures to toggle. If
> the switch is toggled, nr_irqs is a variable, otherwise it's a carbon
> copy of NR_IRQS. Some array-definition, declaration and initialization
> wrappers are provided as well.
>
> - then the core code, x86 and most drivers can be converted to nr_irqs.
> The switch might initially even be user-selectable if
> CONFIG_DEBUG_KERNEL, to ease regression testing.
>
> - other architectures will follow one by one, fixing their
> arch-dependent drivers as well in the process
>
> - finally we get rid of the wrappers.
>
> Ingo
>

Okay, let's see if I understand this.

First patch introduces a config switch and a variable, nr_irqs that is
set to NR_IRQS. It also dynamically allocates the currently staticly
allocated arrays that are dimensioned by NR_IRQS. It also initializes
these dynamically allocated data structures. This is all done under
the config switch, initially off by default.

Second patch changes core code, x86 and most drivers to use nr_irqs.
This patch will also introduce a calculation of nr_irqs, based on
interrupt sources, that is a better estimate of the number of irqs
in the running system than just picking a guaranteed not-to-exceed
value that may be too big.
Is there a way to identify which drivers need to be addressed?

Then, test the crap out of it.

Other architectures will follow, with the work being done by people
familiar with those architectures.

Clean up anything that's left over that's now been made unnecessary by
the conversion by everyone. Including the config option?

Do I have the gist of it?

--ajm

We are star dust,
We are golden,
We are caught in the Devil's bargain.
--
Alan J. Mayer
SGI
[email protected]
WORK: 651-683-3131
HOME: 651-407-0134
--

2008-03-27 16:33:42

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86_64: resize NR_IRQS for large machines (re-submit)


* Alan Mayer <[email protected]> wrote:

> > well, i dont it has to be (or it should be) an all or nothing patch,
> > given the complexity and risks involved.
> >
> > - we should first introduce a nr_irqs variable and a Kconfig switch
> > (say CONFIG_ARCH_HAS_DYNAMIC_NR_IRQS) for architectures to toggle. If
> > the switch is toggled, nr_irqs is a variable, otherwise it's a carbon
> > copy of NR_IRQS. Some array-definition, declaration and initialization
> > wrappers are provided as well.
> >
> > - then the core code, x86 and most drivers can be converted to nr_irqs.
> > The switch might initially even be user-selectable if
> > CONFIG_DEBUG_KERNEL, to ease regression testing.
> >
> > - other architectures will follow one by one, fixing their
> > arch-dependent drivers as well in the process
> >
> > - finally we get rid of the wrappers.
> >
> > Ingo
> >
>
> Okay, let's see if I understand this.
>
> First patch introduces a config switch and a variable, nr_irqs that is
> set to NR_IRQS. It also dynamically allocates the currently staticly
> allocated arrays that are dimensioned by NR_IRQS. It also initializes
> these dynamically allocated data structures. This is all done under
> the config switch, initially off by default.
>
> Second patch changes core code, x86 and most drivers to use nr_irqs.
> This patch will also introduce a calculation of nr_irqs, based on
> interrupt sources, that is a better estimate of the number of irqs
> in the running system than just picking a guaranteed not-to-exceed
> value that may be too big.
> Is there a way to identify which drivers need to be addressed?
>
> Then, test the crap out of it.
>
> Other architectures will follow, with the work being done by people
> familiar with those architectures.
>
> Clean up anything that's left over that's now been made unnecessary by
> the conversion by everyone. Including the config option?
>
> Do I have the gist of it?

i think you got it right, yes. But ... this is just a quick first-look
suggestion from me, YMMV. Maybe you find a way to do it much easier to
just convert everything at once. I tend to do things more gradually, in
my experience it's very hard and time-consuming to change the world all
at once - it's hard both to you the developer (you dont know whether it
works until you have a very substantial amount of code written - while
in a more gradual approach it can be converted one by one perhaps) - and
it's hard for users and fellow kernel hackers.

Ingo

2008-03-27 16:54:18

by Alan Mayer

[permalink] [raw]
Subject: Re: [PATCH] x86_64: resize NR_IRQS for large machines (re-submit)

> i think you got it right, yes. But ... this is just a quick first-look
> suggestion from me, YMMV. Maybe you find a way to do it much easier to
> just convert everything at once. I tend to do things more gradually, in
> my experience it's very hard and time-consuming to change the world all
> at once - it's hard both to you the developer (you dont know whether it
> works until you have a very substantial amount of code written - while
> in a more gradual approach it can be converted one by one perhaps) - and
> it's hard for users and fellow kernel hackers.
>
> Ingo
>

I think I'll take a crack at it. Doing it in phases means I can invest
a little less time and still give everyone a chance to see if they like
it. And, the initial stuff seems like the area where people will be the
most likely to have problems and/or suggestions. I have no idea how
long it will take to get something out there. Stay tuned.

--ajm


Lately it occurs to me,
What a long, strange trip it's been.
--
Alan J. Mayer
SGI
[email protected]
WORK: 651-683-3131
HOME: 651-407-0134
--

2008-03-28 09:32:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86_64: resize NR_IRQS for large machines (re-submit)


* Alan Mayer <[email protected]> wrote:

> > i think you got it right, yes. But ... this is just a quick
> > first-look suggestion from me, YMMV. Maybe you find a way to do it
> > much easier to just convert everything at once. I tend to do things
> > more gradually, in my experience it's very hard and time-consuming
> > to change the world all at once - it's hard both to you the
> > developer (you dont know whether it works until you have a very
> > substantial amount of code written - while in a more gradual
> > approach it can be converted one by one perhaps) - and it's hard for
> > users and fellow kernel hackers.
>
> I think I'll take a crack at it. Doing it in phases means I can
> invest a little less time and still give everyone a chance to see if
> they like it. And, the initial stuff seems like the area where people
> will be the most likely to have problems and/or suggestions. I have
> no idea how long it will take to get something out there. Stay tuned.

great! If your initial target for this is x86 (which has certainly the
most twisted IRQ architecture of all architectures) then we'd be glad to
host and test your patches in x86.git/latest - even if they touch
drivers and other core code. Once it's proven enough it could be
rehosted to -mm.

Ingo