2008-03-24 14:31:14

by Alan Mayer

[permalink] [raw]
Subject: [PATCH] x86_64: resize NR_IRQS for large machines

Subject: [PATCH] x86_64: resize NR_IRQS for large machines

From: Alan Mayer <[email protected]>

On machines with very large numbers of cpus, tables that are dimensioned
by NR_IRQS get very large, especially the irq_desc table. They are also
very sparsely used. When the cpu count is > MAX_IO_APICS, use MAX_IO_APICS
to set NR_IRQS, otherwise use NR_CPUS.

Signed-off-by: Alan Mayer <[email protected]>

Reviewed-by: Christoph Lameter <[email protected]>

---

Index: v2.6.25-rc6/include/asm-x86/irq_64.h
===================================================================
--- v2.6.25-rc6.orig/include/asm-x86/irq_64.h 2008-03-19 16:52:52.000000000 -0500
+++ v2.6.25-rc6/include/asm-x86/irq_64.h 2008-03-20 16:46:51.000000000 -0500
@@ -10,6 +10,10 @@
* <[email protected]>
*/

+#if !defined(MAX_IO_APICS)
+#include <asm/apicdef.h>
+#endif
+
#define TIMER_IRQ 0

/*
@@ -31,7 +35,11 @@

#define FIRST_SYSTEM_VECTOR 0xef /* duplicated in hw_irq.h */

-#define NR_IRQS (NR_VECTORS + (32 *NR_CPUS))
+#if NR_CPUS < MAX_IO_APICS
+#define NR_IRQS (NR_VECTORS + (32 * NR_CPUS))
+#else
+#define NR_IRQS (NR_VECTORS + (32 * MAX_IO_APICS))
+#endif
#define NR_IRQ_VECTORS NR_IRQS

static __inline__ int irq_canonicalize(int irq)
Index: v2.6.25-rc6/include/linux/kernel_stat.h
===================================================================
--- v2.6.25-rc6.orig/include/linux/kernel_stat.h 2008-03-19 16:53:00.000000000 -0500
+++ v2.6.25-rc6/include/linux/kernel_stat.h 2008-03-20 11:12:27.000000000 -0500
@@ -1,11 +1,11 @@
#ifndef _LINUX_KERNEL_STAT_H
#define _LINUX_KERNEL_STAT_H

-#include <asm/irq.h>
#include <linux/smp.h>
#include <linux/threads.h>
#include <linux/percpu.h>
#include <linux/cpumask.h>
+#include <asm/irq.h>
#include <asm/cputime.h>

/*


2008-03-25 16:19:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86_64: resize NR_IRQS for large machines


* Alan Mayer <[email protected]> wrote:

> Subject: [PATCH] x86_64: resize NR_IRQS for large machines
>
> From: Alan Mayer <[email protected]>
>
> On machines with very large numbers of cpus, tables that are
> dimensioned by NR_IRQS get very large, especially the irq_desc table.
> They are also very sparsely used. When the cpu count is >
> MAX_IO_APICS, use MAX_IO_APICS to set NR_IRQS, otherwise use NR_CPUS.
>
> Signed-off-by: Alan Mayer <[email protected]>
> Reviewed-by: Christoph Lameter <[email protected]>

thanks, applied. I suspect this can wait until .26, as there appears to
be a number of other patches that you need to get these boxes running on
vanilla, right?

Ingo

2008-03-25 16:24:17

by Alan Mayer

[permalink] [raw]
Subject: Re: [PATCH] x86_64: resize NR_IRQS for large machines

On Tue, 25 Mar 2008, Ingo Molnar wrote:

>
> * Alan Mayer <[email protected]> wrote:
>
> > Subject: [PATCH] x86_64: resize NR_IRQS for large machines
> >
> > From: Alan Mayer <[email protected]>
> >
> > On machines with very large numbers of cpus, tables that are
> > dimensioned by NR_IRQS get very large, especially the irq_desc table.
> > They are also very sparsely used. When the cpu count is >
> > MAX_IO_APICS, use MAX_IO_APICS to set NR_IRQS, otherwise use NR_CPUS.
> >
> > Signed-off-by: Alan Mayer <[email protected]>
> > Reviewed-by: Christoph Lameter <[email protected]>
>
> thanks, applied. I suspect this can wait until .26, as there appears to
> be a number of other patches that you need to get these boxes running on
> vanilla, right?
>
> Ingo

That's right. Thanks.

--ajm

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Spirits are using me,
A larger voice is calling me.
--
Alan J. Mayer
SGI
[email protected]
WORK: 651-683-3131
HOME: 651-407-0134
--

2008-03-26 16:06:52

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] x86_64: resize NR_IRQS for large machines

Hi!

> Subject: [PATCH] x86_64: resize NR_IRQS for large machines
>
> From: Alan Mayer <[email protected]>
>
> On machines with very large numbers of cpus, tables that are dimensioned
> by NR_IRQS get very large, especially the irq_desc table. They are also
> very sparsely used. When the cpu count is > MAX_IO_APICS, use MAX_IO_APICS
> to set NR_IRQS, otherwise use NR_CPUS.
>
> Signed-off-by: Alan Mayer <[email protected]>
>
> Reviewed-by: Christoph Lameter <[email protected]>

> ===================================================================
> --- v2.6.25-rc6.orig/include/asm-x86/irq_64.h 2008-03-19 16:52:52.000000000 -0500
> +++ v2.6.25-rc6/include/asm-x86/irq_64.h 2008-03-20 16:46:51.000000000 -0500
> @@ -10,6 +10,10 @@
> * <[email protected]>
> */
>
> +#if !defined(MAX_IO_APICS)
> +#include <asm/apicdef.h>
> +#endif
> +

This is very ugly. Why not include it unconditionally -- with guard in
apicdef.h?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-03-26 16:31:00

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] x86_64: resize NR_IRQS for large machines



On Wed, 26 Mar 2008, Pavel Machek wrote:
>
> This is very ugly. Why not include it unconditionally -- with guard in
> apicdef.h?

I do agree that it's ugly, but I think the ugliness is more serious than
that.

What I think we should do is to make NR_IRQS no longer be a compile-time
constant, but instead just do something like

unsigned int NR_IRQS __read_mostly;

and then just set it early in the boot sequence depending on the real CPU
numbers etc.

I realize that this will require some changes to a few arrays that are
statically allocated and depend on NR_IRQ's (notably "irq_desc"), but
don't you guys think that this would be a cleaner thing?

[ I suspect that irq_desc[] itself could quite reasonably be a rather much
smaller __read_mostly hash-table of dynamically allocated entries - the
thing would be only modified at boot, so it should cache beautifully
even across hundreds of CPU's ]

Whatever. I'm not opposed to this whole static thing, but I do wonder if
it's worth doing that way. There *may* be performance reasons for doing it
the way we're doing it, but quite frankly, I think the #define is mostly
purely historical, from when it was just a fixed number (originally 16!)
and it made sense to think of it as a small static array.

Linus

2008-03-26 18:49:20

by Alan Mayer

[permalink] [raw]
Subject: Re: [PATCH] x86_64: resize NR_IRQS for large machines


On Wed, 26 Mar 2008, Pavel Machek wrote:

> Hi!
>
> > Subject: [PATCH] x86_64: resize NR_IRQS for large machines
> >
> > From: Alan Mayer <[email protected]>
> >
> > On machines with very large numbers of cpus, tables that are dimensioned
> > by NR_IRQS get very large, especially the irq_desc table. They are also
> > very sparsely used. When the cpu count is > MAX_IO_APICS, use MAX_IO_APICS
> > to set NR_IRQS, otherwise use NR_CPUS.
> >
> > Signed-off-by: Alan Mayer <[email protected]>
> >
> > Reviewed-by: Christoph Lameter <[email protected]>
>
> > ===================================================================
> > --- v2.6.25-rc6.orig/include/asm-x86/irq_64.h 2008-03-19 16:52:52.000000000 -0500
> > +++ v2.6.25-rc6/include/asm-x86/irq_64.h 2008-03-20 16:46:51.000000000 -0500
> > @@ -10,6 +10,10 @@
> > * <[email protected]>
> > */
> >
> > +#if !defined(MAX_IO_APICS)
> > +#include <asm/apicdef.h>
> > +#endif
> > +
>
> This is very ugly. Why not include it unconditionally -- with guard in
> apicdef.h?
> Pavel

Okay, I can change that.

--ajm

> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
>

--
Alan J. Mayer
SGI
[email protected]
WORK: 651-683-3131
HOME: 651-407-0134
--

2008-03-26 18:55:36

by Alan Mayer

[permalink] [raw]
Subject: Re: [PATCH] x86_64: resize NR_IRQS for large machines

On Wed, 26 Mar 2008, Linus Torvalds wrote:

>
>
> On Wed, 26 Mar 2008, Pavel Machek wrote:
> >
> > This is very ugly. Why not include it unconditionally -- with guard in
> > apicdef.h?
>
> I do agree that it's ugly, but I think the ugliness is more serious than
> that.
>
> What I think we should do is to make NR_IRQS no longer be a compile-time
> constant, but instead just do something like
>
> unsigned int NR_IRQS __read_mostly;
>
> and then just set it early in the boot sequence depending on the real CPU
> numbers etc.
>
> I realize that this will require some changes to a few arrays that are
> statically allocated and depend on NR_IRQ's (notably "irq_desc"), but
> don't you guys think that this would be a cleaner thing?
>
> [ I suspect that irq_desc[] itself could quite reasonably be a rather much
> smaller __read_mostly hash-table of dynamically allocated entries - the
> thing would be only modified at boot, so it should cache beautifully
> even across hundreds of CPU's ]
>
> Whatever. I'm not opposed to this whole static thing, but I do wonder if
> it's worth doing that way. There *may* be performance reasons for doing it
> the way we're doing it, but quite frankly, I think the #define is mostly
> purely historical, from when it was just a fixed number (originally 16!)
> and it made sense to think of it as a small static array.
>
> Linus
>

Well, I was looking at it from that point of view. But, when I found myself
looking at code, particularly in drivers, that indexed into the irq_desc array
and started modifying the descriptor in place and then calling setup_irq(),
I realized that what was needed was a redesign of the whole mess from first
principals. I still think that's what needs to be done, but by some one with
more experience and credibility than me. Maybe in a year I'd be willing
to attempt it, but not today.

--ajm


--
Alan J. Mayer
SGI
[email protected]
WORK: 651-683-3131
HOME: 651-407-0134
--