2002-12-20 22:49:53

by Protasevich, Natalie

[permalink] [raw]
Subject: RE: [PATCH][2.4] generic cluster APIC support for systems with m ore than 8 CPUs

> On Thu, Dec 19, 2002 at 06:04:55PM -0800, James Cleverdon wrote:
> >>> A generic patch should also support Unisys' new box, the ES7000 or
> >>> some such.
>
> On Fri, Dec 20, 2002 at 08:00:50AM +0000, Christoph Hellwig wrote:
> >> That box needs more changes than just the apic setup. Unfortunately
> >> unisys thinks they shouldn't send their patches to lkml, but when you
see
> >> them e.g. in the suse tree it's a bit understandable that they don't
want
> >> anyone to really see their mess :)

Briefly, our ES7000 boxes are non-NUMA, but use clustered APICs (logical
with Cascades, and physical with Gallatins/Fosters). Our code is pretty much
within the clustered APIC code (when both physical and logical are
implemented). Even with NUMA that is forced in clustered APIC case, we are
usually OK as a single-node case.
There are only a few problems with porting the Linux kernel to the ES7000:
we use 8-bit APIC IDs - this makes us use APIC_LDR instead of
APIC_ID throughout the code;
we have special RTE destination values on IO-APIC - the "if" in the
programming IO-APIC line code;
we introduce severe IRQ override case - we remap ISA interrupts to a
different interrupt range (all the "i < 16" clauses).

Also, I usually have to add things like XTPR mechanism for Fosters/Gallatins
and disable conventional IRQ balancing, since our IO-APIC doesn't work this
way... (All of the above is in the SuSE code base).

I worked with the SuSE tree which has clustered code (at the first glance)
close to the patch being discussed here.
The 2.5 tree gives us a benefit of the subarch that will accomodate
(hopefully) our special cases.
But I may need to add more hooks.

>No need to sugar-coat anything :-)

>Natalie is the engineer who added support for the ES7000 to Linux.
>Fortunately she is in the cube next to me.

>She has sent the patches to SuSE/United Linux, and is in the final process
>of testing them on 2.5.5x before submitting them to LKML for comment.

> >> And btw, the box isn't that new, but three years ago or so when they
first
> >> showed it on cebit they even refused to talk about linux due to their
> >> restrictive agreements with Microsoft..
>
> On Fri, Dec 20, 2002 at 03:24:01AM -0800, William Lee Irwin III wrote:
> > Kevin, you're the only lkml-posting contact point I know of within
Unisys.
> > Is there any chance you could flag down some of the ia32 crew there for
> > some commentary on this stuff? (or do so yourself if you're in it)

I will be looking at the Intel patch submited against 2.4 with support for
the ES7000 in mind. I am trying to get the ES7000 patch for 2.5.x out
sometime next week (my boss won't let me have a life until I get ES7000
support in 2.5 (:-<)). At the same time, we are very interested in any
clustered APIC patch that goes in the 2.5 tree (sooner the better). Having
physical cluster support in 2.5 would greatly reduce the size of diffs for
the ES7000.

>I mostly work on our 16-32p IA64 machines. Natalie or someone else will
>have to comment on the clustered-apic code.

>I do know that a lot of the code for the ES7000 is optional, and only
>required to support value-added management functionality, which is
>especially useful if you are running more than one OS instance on the
>machine (it supports 8 fully-independent partitions).

>Also, as a clarification, our 32-processor systems are NOT NUMA: there
>is a full non-blocking crossbar to memory. So clustered APIC support
>should not be dependant on NUMA.

>Kevin


2002-12-20 23:26:39

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH][2.4] generic cluster APIC support for systems with m ore than 8 CPUs

On Fri, Dec 20, 2002 at 04:57:28PM -0600, Protasevich, Natalie wrote:
> Briefly, our ES7000 boxes are non-NUMA, but use clustered APICs (logical
> with Cascades, and physical with Gallatins/Fosters). Our code is pretty much
> within the clustered APIC code (when both physical and logical are
> implemented). Even with NUMA that is forced in clustered APIC case, we are
> usually OK as a single-node case.

Okay, so nothing wild like a non-APIC interrupt controller is going on
here. (c.f. Voyager for an example).


On Fri, Dec 20, 2002 at 04:57:28PM -0600, Protasevich, Natalie wrote:
> There are only a few problems with porting the Linux kernel to the ES7000:
> we use 8-bit APIC IDs - this makes us use APIC_LDR instead of
> APIC_ID throughout the code;
> we have special RTE destination values on IO-APIC - the "if" in the
> programming IO-APIC line code;
> we introduce severe IRQ override case - we remap ISA interrupts to a
> different interrupt range (all the "i < 16" clauses).
> Also, I usually have to add things like XTPR mechanism for Fosters/Gallatins
> and disable conventional IRQ balancing, since our IO-APIC doesn't work this
> way... (All of the above is in the SuSE code base).

Venkatesh, do you think you can handle these generically? Aside from
machine-specific configurations this all looks like perfectly generic.

If it's publicly discussable, what's the difference wrt. the IO-APIC?
IIRC NUMA-Q had a similar issue, where flat logical destinations were
being programmed into the IO-APIC by the IRQ balancing code, but the
NUMA-Q IO-APIC was programmed to accept physical destinations in the
RTE's via the DESTMOD bit, using physical broadcast by default, and
achieving node-locality as physical destinations may not refer to
off-node cpus. There probably isn't an issue of node locality, but even
if the IO-APIC's are programmed for logical DESTMOD it won't work with
the flat logical gunk the original IRQ balance patch programmed up.

>From 2.5.52 include/asm-i386/smp.h:

#ifdef CONFIG_CLUSTERED_APIC
#define INT_DELIVERY_MODE 0 /* physical delivery on LOCAL quad */
#else
#define INT_DELIVERY_MODE 1 /* logical delivery broadcast to all procs */
#endif


>From 2.5.52 arch/i386/mach-generic/mach_apic.h:

#ifdef CONFIG_SMP
#define TARGET_CPUS (clustered_apic_mode ? 0xf : cpu_online_map)
#else
#define TARGET_CPUS 0x01
#endif

And while setting up the RTE's in io_apic.c:

entry.delivery_mode = dest_LowestPrio;
entry.dest_mode = INT_DELIVERY_MODE;
entry.mask = 0; /* enable IRQ */
entry.dest.logical.logical_dest = TARGET_CPUS;

... which is rather blatant abuse of entry.dest.logical.logical_dest
for the NUMA-Q case, but never mind that.


On Fri, Dec 20, 2002 at 04:57:28PM -0600, Protasevich, Natalie wrote:
> I worked with the SuSE tree which has clustered code (at the first glance)
> close to the patch being discussed here.
> The 2.5 tree gives us a benefit of the subarch that will accomodate
> (hopefully) our special cases.
> But I may need to add more hooks.

It'd be great to have the APIC interface general enough to handle all
these machines.


Bill

2002-12-25 21:35:55

by Alan

[permalink] [raw]
Subject: RE: [PATCH][2.4] generic cluster APIC support for systems with m ore than 8 CPUs

One thing I will say. Your code would be a hell of a lot saner for
merging if you mapped the ISA/Legacy IRQ's as 0-15 (to software) and the
PCI ones to 16+ like everyone else does. That would kill a _lot_ of
ifdefs and the IRQ0 corner case