Subject: [PATCH] x86_64: resize NR_IRQS for large machines
From: Alan Mayer <[email protected]>
On machines with very large numbers of cpus, tables that are dimensioned
by NR_IRQS get very large, especially the irq_desc table. They are also
very sparsely used. When the cpu count is > MAX_IO_APICS, use MAX_IO_APICS
to set NR_IRQS, otherwise use NR_CPUS.
Signed-off-by: Alan Mayer <[email protected]>
Reviewed-by: Christoph Lameter <[email protected]>
---
Index: v2.6.25-rc6/include/asm-x86/irq_64.h
===================================================================
--- v2.6.25-rc6.orig/include/asm-x86/irq_64.h 2008-03-19 16:52:52.000000000 -0500
+++ v2.6.25-rc6/include/asm-x86/irq_64.h 2008-03-20 16:46:51.000000000 -0500
@@ -10,6 +10,10 @@
* <[email protected]>
*/
+#if !defined(MAX_IO_APICS)
+#include <asm/apicdef.h>
+#endif
+
#define TIMER_IRQ 0
/*
@@ -31,7 +35,11 @@
#define FIRST_SYSTEM_VECTOR 0xef /* duplicated in hw_irq.h */
-#define NR_IRQS (NR_VECTORS + (32 *NR_CPUS))
+#if NR_CPUS < MAX_IO_APICS
+#define NR_IRQS (NR_VECTORS + (32 * NR_CPUS))
+#else
+#define NR_IRQS (NR_VECTORS + (32 * MAX_IO_APICS))
+#endif
#define NR_IRQ_VECTORS NR_IRQS
static __inline__ int irq_canonicalize(int irq)
Index: v2.6.25-rc6/include/linux/kernel_stat.h
===================================================================
--- v2.6.25-rc6.orig/include/linux/kernel_stat.h 2008-03-19 16:53:00.000000000 -0500
+++ v2.6.25-rc6/include/linux/kernel_stat.h 2008-03-20 11:12:27.000000000 -0500
@@ -1,11 +1,11 @@
#ifndef _LINUX_KERNEL_STAT_H
#define _LINUX_KERNEL_STAT_H
-#include <asm/irq.h>
#include <linux/smp.h>
#include <linux/threads.h>
#include <linux/percpu.h>
#include <linux/cpumask.h>
+#include <asm/irq.h>
#include <asm/cputime.h>
/*
* Alan Mayer <[email protected]> wrote:
> Subject: [PATCH] x86_64: resize NR_IRQS for large machines
>
> From: Alan Mayer <[email protected]>
>
> On machines with very large numbers of cpus, tables that are
> dimensioned by NR_IRQS get very large, especially the irq_desc table.
> They are also very sparsely used. When the cpu count is >
> MAX_IO_APICS, use MAX_IO_APICS to set NR_IRQS, otherwise use NR_CPUS.
>
> Signed-off-by: Alan Mayer <[email protected]>
> Reviewed-by: Christoph Lameter <[email protected]>
thanks, applied. I suspect this can wait until .26, as there appears to
be a number of other patches that you need to get these boxes running on
vanilla, right?
Ingo
On Tue, 25 Mar 2008, Ingo Molnar wrote:
>
> * Alan Mayer <[email protected]> wrote:
>
> > Subject: [PATCH] x86_64: resize NR_IRQS for large machines
> >
> > From: Alan Mayer <[email protected]>
> >
> > On machines with very large numbers of cpus, tables that are
> > dimensioned by NR_IRQS get very large, especially the irq_desc table.
> > They are also very sparsely used. When the cpu count is >
> > MAX_IO_APICS, use MAX_IO_APICS to set NR_IRQS, otherwise use NR_CPUS.
> >
> > Signed-off-by: Alan Mayer <[email protected]>
> > Reviewed-by: Christoph Lameter <[email protected]>
>
> thanks, applied. I suspect this can wait until .26, as there appears to
> be a number of other patches that you need to get these boxes running on
> vanilla, right?
>
> Ingo
That's right. Thanks.
--ajm
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Spirits are using me,
A larger voice is calling me.
--
Alan J. Mayer
SGI
[email protected]
WORK: 651-683-3131
HOME: 651-407-0134
--
Hi!
> Subject: [PATCH] x86_64: resize NR_IRQS for large machines
>
> From: Alan Mayer <[email protected]>
>
> On machines with very large numbers of cpus, tables that are dimensioned
> by NR_IRQS get very large, especially the irq_desc table. They are also
> very sparsely used. When the cpu count is > MAX_IO_APICS, use MAX_IO_APICS
> to set NR_IRQS, otherwise use NR_CPUS.
>
> Signed-off-by: Alan Mayer <[email protected]>
>
> Reviewed-by: Christoph Lameter <[email protected]>
> ===================================================================
> --- v2.6.25-rc6.orig/include/asm-x86/irq_64.h 2008-03-19 16:52:52.000000000 -0500
> +++ v2.6.25-rc6/include/asm-x86/irq_64.h 2008-03-20 16:46:51.000000000 -0500
> @@ -10,6 +10,10 @@
> * <[email protected]>
> */
>
> +#if !defined(MAX_IO_APICS)
> +#include <asm/apicdef.h>
> +#endif
> +
This is very ugly. Why not include it unconditionally -- with guard in
apicdef.h?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Wed, 26 Mar 2008, Pavel Machek wrote:
>
> This is very ugly. Why not include it unconditionally -- with guard in
> apicdef.h?
I do agree that it's ugly, but I think the ugliness is more serious than
that.
What I think we should do is to make NR_IRQS no longer be a compile-time
constant, but instead just do something like
unsigned int NR_IRQS __read_mostly;
and then just set it early in the boot sequence depending on the real CPU
numbers etc.
I realize that this will require some changes to a few arrays that are
statically allocated and depend on NR_IRQ's (notably "irq_desc"), but
don't you guys think that this would be a cleaner thing?
[ I suspect that irq_desc[] itself could quite reasonably be a rather much
smaller __read_mostly hash-table of dynamically allocated entries - the
thing would be only modified at boot, so it should cache beautifully
even across hundreds of CPU's ]
Whatever. I'm not opposed to this whole static thing, but I do wonder if
it's worth doing that way. There *may* be performance reasons for doing it
the way we're doing it, but quite frankly, I think the #define is mostly
purely historical, from when it was just a fixed number (originally 16!)
and it made sense to think of it as a small static array.
Linus
On Wed, 26 Mar 2008, Pavel Machek wrote:
> Hi!
>
> > Subject: [PATCH] x86_64: resize NR_IRQS for large machines
> >
> > From: Alan Mayer <[email protected]>
> >
> > On machines with very large numbers of cpus, tables that are dimensioned
> > by NR_IRQS get very large, especially the irq_desc table. They are also
> > very sparsely used. When the cpu count is > MAX_IO_APICS, use MAX_IO_APICS
> > to set NR_IRQS, otherwise use NR_CPUS.
> >
> > Signed-off-by: Alan Mayer <[email protected]>
> >
> > Reviewed-by: Christoph Lameter <[email protected]>
>
> > ===================================================================
> > --- v2.6.25-rc6.orig/include/asm-x86/irq_64.h 2008-03-19 16:52:52.000000000 -0500
> > +++ v2.6.25-rc6/include/asm-x86/irq_64.h 2008-03-20 16:46:51.000000000 -0500
> > @@ -10,6 +10,10 @@
> > * <[email protected]>
> > */
> >
> > +#if !defined(MAX_IO_APICS)
> > +#include <asm/apicdef.h>
> > +#endif
> > +
>
> This is very ugly. Why not include it unconditionally -- with guard in
> apicdef.h?
> Pavel
Okay, I can change that.
--ajm
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
>
--
Alan J. Mayer
SGI
[email protected]
WORK: 651-683-3131
HOME: 651-407-0134
--
On Wed, 26 Mar 2008, Linus Torvalds wrote:
>
>
> On Wed, 26 Mar 2008, Pavel Machek wrote:
> >
> > This is very ugly. Why not include it unconditionally -- with guard in
> > apicdef.h?
>
> I do agree that it's ugly, but I think the ugliness is more serious than
> that.
>
> What I think we should do is to make NR_IRQS no longer be a compile-time
> constant, but instead just do something like
>
> unsigned int NR_IRQS __read_mostly;
>
> and then just set it early in the boot sequence depending on the real CPU
> numbers etc.
>
> I realize that this will require some changes to a few arrays that are
> statically allocated and depend on NR_IRQ's (notably "irq_desc"), but
> don't you guys think that this would be a cleaner thing?
>
> [ I suspect that irq_desc[] itself could quite reasonably be a rather much
> smaller __read_mostly hash-table of dynamically allocated entries - the
> thing would be only modified at boot, so it should cache beautifully
> even across hundreds of CPU's ]
>
> Whatever. I'm not opposed to this whole static thing, but I do wonder if
> it's worth doing that way. There *may* be performance reasons for doing it
> the way we're doing it, but quite frankly, I think the #define is mostly
> purely historical, from when it was just a fixed number (originally 16!)
> and it made sense to think of it as a small static array.
>
> Linus
>
Well, I was looking at it from that point of view. But, when I found myself
looking at code, particularly in drivers, that indexed into the irq_desc array
and started modifying the descriptor in place and then calling setup_irq(),
I realized that what was needed was a redesign of the whole mess from first
principals. I still think that's what needs to be done, but by some one with
more experience and credibility than me. Maybe in a year I'd be willing
to attempt it, but not today.
--ajm
--
Alan J. Mayer
SGI
[email protected]
WORK: 651-683-3131
HOME: 651-407-0134
--