2004-01-14 00:27:19

by James Cleverdon

[permalink] [raw]
Subject: [PATCH] 2.6.1-mm2: Get irq_vector size right for generic subarch UP installer kernels

Problem: Earlier I didn't consider the case of the generic sub-arch and
uni-proc installer kernels used by a number of distros. It currently is
scaled by NR_CPUS. The correct values should be big for summit and generic,
and can stay the same for all others.


diff -pru 2.6.1-mm2/include/asm-i386/mach-default/irq_vectors.h
t1mm2/include/asm-i386/mach-default/irq_vectors.h
--- 2.6.1-mm2/include/asm-i386/mach-default/irq_vectors.h 2004-01-08
22:59:19.000000000 -0800
+++ t1mm2/include/asm-i386/mach-default/irq_vectors.h 2004-01-13
13:43:56.000000000 -0800
@@ -90,8 +90,12 @@
#else
#ifdef CONFIG_X86_IO_APIC
#define NR_IRQS 224
-# if (224 >= 32 * NR_CPUS)
-# define NR_IRQ_VECTORS NR_IRQS
+/*
+ * For Summit or generic (i.e. installer) kernels, we have lots of I/O APICs,
+ * even with uni-proc kernels, so use a big array.
+ */
+# if defined(CONFIG_X86_SUMMIT) || defined(CONFIG_X86_GENERICARCH)
+# define NR_IRQ_VECTORS 1024
# else
# define NR_IRQ_VECTORS (32 * NR_CPUS)
# endif


Attachments:
(No filename) (997.00 B)
irq_vector_2004-01-13_2.6.1-mm2 (728.00 B)
Download all attachments

2004-01-14 01:01:08

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH] 2.6.1-mm2: Get irq_vector size right for generic subarch UP installer kernels

On Tue, 13 Jan 2004, James Cleverdon wrote:

> Problem: Earlier I didn't consider the case of the generic sub-arch and
> uni-proc installer kernels used by a number of distros. It currently is
> scaled by NR_CPUS. The correct values should be big for summit and generic,
> and can stay the same for all others.

This all looks strange, especially in assign_irq_vector() does this
mean that you'll try and allocate up to 1024 vectors?

> diff -pru 2.6.1-mm2/include/asm-i386/mach-default/irq_vectors.h
> t1mm2/include/asm-i386/mach-default/irq_vectors.h
> --- 2.6.1-mm2/include/asm-i386/mach-default/irq_vectors.h 2004-01-08
> 22:59:19.000000000 -0800
> +++ t1mm2/include/asm-i386/mach-default/irq_vectors.h 2004-01-13
> 13:43:56.000000000 -0800
> @@ -90,8 +90,12 @@
> #else
> #ifdef CONFIG_X86_IO_APIC
> #define NR_IRQS 224
> -# if (224 >= 32 * NR_CPUS)
> -# define NR_IRQ_VECTORS NR_IRQS
> +/*
> + * For Summit or generic (i.e. installer) kernels, we have lots of I/O APICs,
> + * even with uni-proc kernels, so use a big array.
> + */
> +# if defined(CONFIG_X86_SUMMIT) || defined(CONFIG_X86_GENERICARCH)
> +# define NR_IRQ_VECTORS 1024
> # else
> # define NR_IRQ_VECTORS (32 * NR_CPUS)
> # endif


Attachments:
irq_vector_2004-01-13_2.6.1-mm2 (728.00 B)

2004-01-14 19:38:01

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] 2.6.1-mm2: Get irq_vector size right for generic subarch UP installer kernels

I don't think this:

+# if defined(CONFIG_X86_SUMMIT) || defined(CONFIG_X86_GENERICARCH)

Is a good idea. You're contaminating the default subarch with another
subarch specific #define.

generic arch additions are fine here, but you should find a better way
to abstract the summit stuff.

James


2004-01-14 19:59:57

by James Cleverdon

[permalink] [raw]
Subject: Re: [PATCH] 2.6.1-mm2: Get irq_vector size right for generic subarch UP installer kernels

On Tuesday 13 January 2004 5:00 pm, Zwane Mwaikambo wrote:
> On Tue, 13 Jan 2004, James Cleverdon wrote:
> > Problem: Earlier I didn't consider the case of the generic sub-arch and
> > uni-proc installer kernels used by a number of distros. It currently is
> > scaled by NR_CPUS. The correct values should be big for summit and
> > generic, and can stay the same for all others.
>
> This all looks strange, especially in assign_irq_vector() does this
> mean that you'll try and allocate up to 1024 vectors?

The irq_vector array name is a bit misleading. It contains the vectors for
each potential I/O APIC RTE. The array needs to be at least the sum of all
the RTEs in the system. Summit PCI bridge chips have large I/O APICs (50
RTEs), and a large system has up to 16 of them. 16*50 = 800. Allocating 1k
entries provides a pad for the future, and u8 doesn't cost much.


> > diff -pru 2.6.1-mm2/include/asm-i386/mach-default/irq_vectors.h
> > t1mm2/include/asm-i386/mach-default/irq_vectors.h
> > --- 2.6.1-mm2/include/asm-i386/mach-default/irq_vectors.h 2004-01-08
> > 22:59:19.000000000 -0800
> > +++ t1mm2/include/asm-i386/mach-default/irq_vectors.h 2004-01-13
> > 13:43:56.000000000 -0800
> > @@ -90,8 +90,12 @@
> > #else
> > #ifdef CONFIG_X86_IO_APIC
> > #define NR_IRQS 224
> > -# if (224 >= 32 * NR_CPUS)
> > -# define NR_IRQ_VECTORS NR_IRQS
> > +/*
> > + * For Summit or generic (i.e. installer) kernels, we have lots of I/O
> > APICs, + * even with uni-proc kernels, so use a big array.
> > + */
> > +# if defined(CONFIG_X86_SUMMIT) || defined(CONFIG_X86_GENERICARCH)
> > +# define NR_IRQ_VECTORS 1024
> > # else
> > # define NR_IRQ_VECTORS (32 * NR_CPUS)
> > # endif

--
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot comm

2004-01-14 20:03:48

by James Cleverdon

[permalink] [raw]
Subject: Re: [PATCH] 2.6.1-mm2: Get irq_vector size right for generic subarch UP installer kernels

On Wednesday 14 January 2004 11:34 am, James Bottomley wrote:
> I don't think this:
>
> +# if defined(CONFIG_X86_SUMMIT) || defined(CONFIG_X86_GENERICARCH)
>
> Is a good idea. You're contaminating the default subarch with another
> subarch specific #define.
>
> generic arch additions are fine here, but you should find a better way
> to abstract the summit stuff.
>
> James

Sorry, but we've had distro installer kernels, which are uni-proc generic
subarch kernels, blow up with array overflows on large systems.

What would you suggest I do instead?

--
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot comm

2004-01-14 21:49:47

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] 2.6.1-mm2: Get irq_vector size right for generic subarch UP installer kernels

On Wed, 2004-01-14 at 15:01, James Cleverdon wrote:
> Sorry, but we've had distro installer kernels, which are uni-proc generic
> subarch kernels, blow up with array overflows on large systems.
>
> What would you suggest I do instead?

For Summit just add an irq_vectors.h file that includes
mach-generic/irq-vectors.h and overrides the NR_IRQ_VECTORS value

For genericarch, I'm less certain, but it seems, given the idea behind
genericarch that it should probably be set at runtime when the arch type
becomes known

James


2004-01-14 21:52:29

by Nakajima, Jun

[permalink] [raw]
Subject: RE: [PATCH] 2.6.1-mm2: Get irq_vector size right for generic subarch UP installer kernels

assign_irq_vector() is okay, and it simply returns vectors
(FIRST_DEVICE_VECTOR <= vector < FIRST_SYSTEM_VECTOR). That means those
IRQs will eventually share the same vector(s). Look at the code.

Jun

> -----Original Message-----
> From: [email protected] [mailto:linux-kernel-
> [email protected]] On Behalf Of Zwane Mwaikambo
> Sent: Tuesday, January 13, 2004 5:00 PM
> To: James Cleverdon
> Cc: Andrew Morton; [email protected]; Linus Torvalds; Chris
> McDermott; Martin J. Bligh
> Subject: Re: [PATCH] 2.6.1-mm2: Get irq_vector size right for generic
> subarch UP installer kernels
>
> On Tue, 13 Jan 2004, James Cleverdon wrote:
>
> > Problem: Earlier I didn't consider the case of the generic sub-arch
and
> > uni-proc installer kernels used by a number of distros. It
currently is
> > scaled by NR_CPUS. The correct values should be big for summit and
> generic,
> > and can stay the same for all others.
>
> This all looks strange, especially in assign_irq_vector() does this
> mean that you'll try and allocate up to 1024 vectors?
>
> > diff -pru 2.6.1-mm2/include/asm-i386/mach-default/irq_vectors.h
> > t1mm2/include/asm-i386/mach-default/irq_vectors.h
> > --- 2.6.1-mm2/include/asm-i386/mach-default/irq_vectors.h
2004-01-08
> > 22:59:19.000000000 -0800
> > +++ t1mm2/include/asm-i386/mach-default/irq_vectors.h
2004-01-13
> > 13:43:56.000000000 -0800
> > @@ -90,8 +90,12 @@
> > #else
> > #ifdef CONFIG_X86_IO_APIC
> > #define NR_IRQS 224
> > -# if (224 >= 32 * NR_CPUS)
> > -# define NR_IRQ_VECTORS NR_IRQS
> > +/*
> > + * For Summit or generic (i.e. installer) kernels, we have lots of
I/O
> APICs,
> > + * even with uni-proc kernels, so use a big array.
> > + */
> > +# if defined(CONFIG_X86_SUMMIT) || defined(CONFIG_X86_GENERICARCH)
> > +# define NR_IRQ_VECTORS 1024
> > # else
> > # define NR_IRQ_VECTORS (32 * NR_CPUS)
> > # endif

2004-01-14 22:21:52

by Zwane Mwaikambo

[permalink] [raw]
Subject: RE: [PATCH] 2.6.1-mm2: Get irq_vector size right for generic subarch UP installer kernels

On Wed, 14 Jan 2004, Nakajima, Jun wrote:

> assign_irq_vector() is okay, and it simply returns vectors
> (FIRST_DEVICE_VECTOR <= vector < FIRST_SYSTEM_VECTOR). That means those
> IRQs will eventually share the same vector(s). Look at the code.

Ok so you have different irqs sharing vectors, how does this work with the
following (where $vector really means $irq of course)?

.data
ENTRY(interrupt)
.text

vector=0
ENTRY(irq_entries_start)
.rept NR_IRQS
ALIGN
1: pushl $vector-256
jmp common_interrupt

Also i was thinking about when you exhaust all 200 odd
spare vectors and then end up doing set_intr_gate twice on the same
vector? I may be missing something really obvious here.

Thanks

2004-01-14 22:24:10

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH] 2.6.1-mm2: Get irq_vector size right for generic subarch UP installer kernels

On Wed, 14 Jan 2004, James Cleverdon wrote:

> On Tuesday 13 January 2004 5:00 pm, Zwane Mwaikambo wrote:
> > On Tue, 13 Jan 2004, James Cleverdon wrote:
> > > Problem: Earlier I didn't consider the case of the generic sub-arch and
> > > uni-proc installer kernels used by a number of distros. It currently is
> > > scaled by NR_CPUS. The correct values should be big for summit and
> > > generic, and can stay the same for all others.
> >
> > This all looks strange, especially in assign_irq_vector() does this
> > mean that you'll try and allocate up to 1024 vectors?
>
> The irq_vector array name is a bit misleading. It contains the vectors for
> each potential I/O APIC RTE. The array needs to be at least the sum of all
> the RTEs in the system. Summit PCI bridge chips have large I/O APICs (50
> RTEs), and a large system has up to 16 of them. 16*50 = 800. Allocating 1k
> entries provides a pad for the future, and u8 doesn't cost much.

Ok i understand the need to accomodate all the RTEs, but what i was
considering was what happens when assign_irq_vector runs out of vectors to
allocate and ends up "wrapping". I just sent an email to Jun Nakajima with
a bit more detail.

Thanks,
Zwane

2004-01-14 23:28:11

by Nakajima, Jun

[permalink] [raw]
Subject: RE: [PATCH] 2.6.1-mm2: Get irq_vector size right for generic subarch UP installer kernels

I tend to agree. I think the confusing part is the range of the IRQs on
that machine. Assuming that irq_vector[NR_IRQ_VECTORS = 1024] requires
more entries, then the IRQs should take that range, because
IO_APCI_VECTOR(irq) is just irq_vector[irq], for example. If NR_IRQS is
still 224, how can do_IRQ() can get the correct IRQ (i.e. >= 224) ? So
in that case, the IRQ should be smaller than 224, then irq_vector[]
should be smaller.

> > #else
> > #ifdef CONFIG_X86_IO_APIC
> > #define NR_IRQS 224
> > -# if (224 >= 32 * NR_CPUS)
> > -# define NR_IRQ_VECTORS NR_IRQS
> > +/*
> > + * For Summit or generic (i.e. installer) kernels, we have lots of
I/O
> > APICs, + * even with uni-proc kernels, so use a big array.
> > + */
> > +# if defined(CONFIG_X86_SUMMIT) || defined(CONFIG_X86_GENERICARCH)
> > +# define NR_IRQ_VECTORS 1024
> > # else
> > # define NR_IRQ_VECTORS (32 * NR_CPUS)
> > # endif

Jun

> -----Original Message-----
> From: Zwane Mwaikambo [mailto:[email protected]]
> Sent: Wednesday, January 14, 2004 2:21 PM
> To: Nakajima, Jun
> Cc: James Cleverdon; Andrew Morton; [email protected];
Linus
> Torvalds; Chris McDermott; Martin J. Bligh
> Subject: RE: [PATCH] 2.6.1-mm2: Get irq_vector size right for generic
> subarch UP installer kernels
>
> On Wed, 14 Jan 2004, Nakajima, Jun wrote:
>
> > assign_irq_vector() is okay, and it simply returns vectors
> > (FIRST_DEVICE_VECTOR <= vector < FIRST_SYSTEM_VECTOR). That means
those
> > IRQs will eventually share the same vector(s). Look at the code.
>
> Ok so you have different irqs sharing vectors, how does this work with
the
> following (where $vector really means $irq of course)?
>
> .data
> ENTRY(interrupt)
> .text
>
> vector=0
> ENTRY(irq_entries_start)
> .rept NR_IRQS
> ALIGN
> 1: pushl $vector-256
> jmp common_interrupt
>
> Also i was thinking about when you exhaust all 200 odd
> spare vectors and then end up doing set_intr_gate twice on the same
> vector? I may be missing something really obvious here.
>
> Thanks

2004-01-15 04:37:11

by Zwane Mwaikambo

[permalink] [raw]
Subject: RE: [PATCH] 2.6.1-mm2: Get irq_vector size right for generic subarch UP installer kernels

On Wed, 14 Jan 2004, Nakajima, Jun wrote:

> I tend to agree. I think the confusing part is the range of the IRQs on
> that machine. Assuming that irq_vector[NR_IRQ_VECTORS = 1024] requires
> more entries, then the IRQs should take that range, because
> IO_APCI_VECTOR(irq) is just irq_vector[irq], for example. If NR_IRQS is
> still 224, how can do_IRQ() can get the correct IRQ (i.e. >= 224) ? So
> in that case, the IRQ should be smaller than 224, then irq_vector[]
> should be smaller.

In my opinion we should be breaking after we've exceeded the maximum
external vectors we can install. This will of course mean less than
the number of RTEs. James have you actually managed to use the devices
connected to the high (over ~224) RTEs?

2004-01-15 21:57:57

by James Cleverdon

[permalink] [raw]
Subject: Re: [PATCH] 2.6.1-mm2: Get irq_vector size right for generic subarch UP installer kernels

On Wednesday 14 January 2004 8:36 pm, Zwane Mwaikambo wrote:
> On Wed, 14 Jan 2004, Nakajima, Jun wrote:
> > I tend to agree. I think the confusing part is the range of the IRQs on
> > that machine. Assuming that irq_vector[NR_IRQ_VECTORS = 1024] requires
> > more entries, then the IRQs should take that range, because
> > IO_APCI_VECTOR(irq) is just irq_vector[irq], for example. If NR_IRQS is
> > still 224, how can do_IRQ() can get the correct IRQ (i.e. >= 224) ? So
> > in that case, the IRQ should be smaller than 224, then irq_vector[]
> > should be smaller.
>
> In my opinion we should be breaking after we've exceeded the maximum
> external vectors we can install. This will of course mean less than
> the number of RTEs. James have you actually managed to use the devices
> connected to the high (over ~224) RTEs?

No, I haven't exceeded the available vectors, but wli has on a large NUMA-Q
box.

The x440 and x445's problems are pre-reserving lots of bus numbers in the
BIOS, more than one per PCI slot. They must be anticipating PCI cards with
bridge chips on them.

I believe that the reason for irq_vector being so large is to allow IRQ (and
eventually vector) sharing. The array is to map from RTE to vector.

--
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot comm

2004-01-15 22:38:08

by Mika Penttilä

[permalink] [raw]
Subject: Re: [PATCH] 2.6.1-mm2: Get irq_vector size right for generic subarch UP installer kernels



James Cleverdon wrote:

>On Wednesday 14 January 2004 8:36 pm, Zwane Mwaikambo wrote:
>
>
>>On Wed, 14 Jan 2004, Nakajima, Jun wrote:
>>
>>
>>>I tend to agree. I think the confusing part is the range of the IRQs on
>>>that machine. Assuming that irq_vector[NR_IRQ_VECTORS = 1024] requires
>>>more entries, then the IRQs should take that range, because
>>>IO_APCI_VECTOR(irq) is just irq_vector[irq], for example. If NR_IRQS is
>>>still 224, how can do_IRQ() can get the correct IRQ (i.e. >= 224) ? So
>>>in that case, the IRQ should be smaller than 224, then irq_vector[]
>>>should be smaller.
>>>
>>>
>>In my opinion we should be breaking after we've exceeded the maximum
>>external vectors we can install. This will of course mean less than
>>the number of RTEs. James have you actually managed to use the devices
>>connected to the high (over ~224) RTEs?
>>
>>
>
>No, I haven't exceeded the available vectors, but wli has on a large NUMA-Q
>box.
>
>The x440 and x445's problems are pre-reserving lots of bus numbers in the
>BIOS, more than one per PCI slot. They must be anticipating PCI cards with
>bridge chips on them.
>
>I believe that the reason for irq_vector being so large is to allow IRQ (and
>eventually vector) sharing. The array is to map from RTE to vector.
>
>
Any attempt to setup an irq >= NR_IRQS will crash, because for instance
entry.c interrupt stubs are an array of NR_IRQS entries...NR_IRQ_VECTORS
> NR_IRQS really doesn't make sense as is.

We do support irq sharing among devices, but not vector sharing among
irqs. For that the handler should loop through irq_vector[] to find
every index, index != irq, irq_vector[index] == irq_vector[irq].

--Mika


2004-01-15 22:42:22

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH] 2.6.1-mm2: Get irq_vector size right for generic subarch UP installer kernels

On Thu, 15 Jan 2004, James Cleverdon wrote:

> > In my opinion we should be breaking after we've exceeded the maximum
> > external vectors we can install. This will of course mean less than
> > the number of RTEs. James have you actually managed to use the devices
> > connected to the high (over ~224) RTEs?
>
> No, I haven't exceeded the available vectors, but wli has on a large NUMA-Q
> box.

Yes i believe the 8 node NUMA-Qs do this easily.

> The x440 and x445's problems are pre-reserving lots of bus numbers in the
> BIOS, more than one per PCI slot. They must be anticipating PCI cards with
> bridge chips on them.

For some odd reason i thought we had dynamic allocated MP busses.

> I believe that the reason for irq_vector being so large is to allow IRQ (and
> eventually vector) sharing. The array is to map from RTE to vector.

What i'm trying to say is that the current code will behave badly when you
actually do encounter a system with that many RTEs, the actual array
being that large isn't a problem, the problem is what's going on in
assign_irq_vector() and later on with set_intr_gate(). By that i mean;

- The interrupt[] stub in entry.S is written with 256 irqs in mind so it
wouldn't be able to pass higher irq numbers to do_IRQ in it's current
state.

- "Wrapping" in assign_irq_vector means that you overrite previously
assigned vector entries in the IDT with whatever interrupt[irq] you happen
to be installing at the time. To avoid this you should not be allowing
more than ~224 vectors to be handed out by assign_irq_vector(), with the
patch and the current code, this is possible.

Lastly i think we should avoid sharing vectors, going the IA64 route and
setting up irq handling domains of sorts would allow for easier scaling
both up and down.

Thanks,
Zwane

2004-01-16 02:35:57

by Nakajima, Jun

[permalink] [raw]
Subject: RE: [PATCH] 2.6.1-mm2: Get irq_vector size right for generic subarch UP installer kernels

What happens first is you will see kernel data corruption as irq exceeds
224 (NR_IRQS).

In setup_IO_APIC_irqs(), for example, we do
if (IO_APIC_IRQ(irq)) {
vector = assign_irq_vector(irq);
entry.vector = vector;
ioapic_register_intr(irq, vector, IOAPIC_AUTO);

if (!apic && (irq < 16))
disable_8259A_irq(irq);
}

Then ioapic_register_intr() does:
if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) ||
trigger == IOAPIC_LEVEL)
irq_desc[irq].handler = &ioapic_level_type;
else
irq_desc[irq].handler = &ioapic_edge_type;
set_intr_gate(vector, interrupt[irq]);

Now of course irq_desc_t irq_desc[NR_IRQS], and same with interrupt[].

I think we need to add checking in IO_APIC_IRQ(irq) like from
#define IO_APIC_IRQ(x) (((x) >= 16) || ((1<<(x)) & io_apic_irqs)) to

#define IO_APIC_IRQ(x) (((x) < NR_IRQS) && ((x) >= 16) || ((1<<(x)) &
io_apic_irqs))

Jun


> -----Original Message-----
> From: Zwane Mwaikambo [mailto:[email protected]]
> Sent: Thursday, January 15, 2004 2:41 PM
> To: James Cleverdon
> Cc: Nakajima, Jun; Andrew Morton; Linux Kernel; Linus Torvalds; Chris
> McDermott; Martin J. Bligh
> Subject: Re: [PATCH] 2.6.1-mm2: Get irq_vector size right for generic
> subarch UP installer kernels
>
> On Thu, 15 Jan 2004, James Cleverdon wrote:
>
> > > In my opinion we should be breaking after we've exceeded the
maximum
> > > external vectors we can install. This will of course mean less
than
> > > the number of RTEs. James have you actually managed to use the
devices
> > > connected to the high (over ~224) RTEs?
> >
> > No, I haven't exceeded the available vectors, but wli has on a large
> NUMA-Q
> > box.
>
> Yes i believe the 8 node NUMA-Qs do this easily.
>
> > The x440 and x445's problems are pre-reserving lots of bus numbers
in
> the
> > BIOS, more than one per PCI slot. They must be anticipating PCI
cards
> with
> > bridge chips on them.
>
> For some odd reason i thought we had dynamic allocated MP busses.
>
> > I believe that the reason for irq_vector being so large is to allow
IRQ
> (and
> > eventually vector) sharing. The array is to map from RTE to vector.
>
> What i'm trying to say is that the current code will behave badly when
you
> actually do encounter a system with that many RTEs, the actual array
> being that large isn't a problem, the problem is what's going on in
> assign_irq_vector() and later on with set_intr_gate(). By that i mean;
>
> - The interrupt[] stub in entry.S is written with 256 irqs in mind so
it
> wouldn't be able to pass higher irq numbers to do_IRQ in it's current
> state.
>
> - "Wrapping" in assign_irq_vector means that you overrite previously
> assigned vector entries in the IDT with whatever interrupt[irq] you
happen
> to be installing at the time. To avoid this you should not be allowing
> more than ~224 vectors to be handed out by assign_irq_vector(), with
the
> patch and the current code, this is possible.
>
> Lastly i think we should avoid sharing vectors, going the IA64 route
and
> setting up irq handling domains of sorts would allow for easier
scaling
> both up and down.
>
> Thanks,
> Zwane

2004-01-16 05:45:38

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] 2.6.1-mm2: Get irq_vector size right for generic subarch UP installer kernels

On Thu, 15 Jan 2004, James Cleverdon wrote:
>> No, I haven't exceeded the available vectors, but wli has on a large
>> NUMA-Q box.

On Thu, Jan 15, 2004 at 05:40:54PM -0500, Zwane Mwaikambo wrote:
> Yes i believe the 8 node NUMA-Qs do this easily.

The mainline kernel panics before console_init() on the 9th IO-APIC
(the first IO-APIC on the 5th node) and this has been an extreme
annoyance essentially since Linux was ported to the beasts.

It won't be long before currently shipping ia32 boxen catch up with
that blast from the 4+ -year-old past, and x86-64 or whatever will have
similar issues if/when it ever gets chipsets allowing similar numbers
of cpus and devices (presumably they're correlated) to be attached,
since AFAIK it didn't change the number of IDT entries.

One thing that's particularly asinine is that since the things use
physical destinations in the RTE's and use a serial APIC bus or
whatever per node (i.e. can only interrupt cpus in their node), they
really are just a tabulation method away from functioning properly.

-- wli

2004-01-18 19:07:43

by Zwane Mwaikambo

[permalink] [raw]
Subject: RE: [PATCH] 2.6.1-mm2: Get irq_vector size right for generic subarch UP installer kernels

On Thu, 15 Jan 2004, Nakajima, Jun wrote:

> What happens first is you will see kernel data corruption as irq exceeds
> 224 (NR_IRQS).
>
> In setup_IO_APIC_irqs(), for example, we do
> if (IO_APIC_IRQ(irq)) {
> vector = assign_irq_vector(irq);
> entry.vector = vector;
> ioapic_register_intr(irq, vector, IOAPIC_AUTO);
>
> if (!apic && (irq < 16))
> disable_8259A_irq(irq);
> }
>
> Then ioapic_register_intr() does:
> if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) ||
> trigger == IOAPIC_LEVEL)
> irq_desc[irq].handler = &ioapic_level_type;
> else
> irq_desc[irq].handler = &ioapic_edge_type;
> set_intr_gate(vector, interrupt[irq]);
>
> Now of course irq_desc_t irq_desc[NR_IRQS], and same with interrupt[].
>
> I think we need to add checking in IO_APIC_IRQ(irq) like from
> #define IO_APIC_IRQ(x) (((x) >= 16) || ((1<<(x)) & io_apic_irqs)) to
>
> #define IO_APIC_IRQ(x) (((x) < NR_IRQS) && ((x) >= 16) || ((1<<(x)) &
> io_apic_irqs))

That looks fine to me, early bailout is safest. James is this ok with you?

Thanks,
Zwane