2002-12-18 22:28:58

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs


2.4.21-pre1 (i386) based patch to fix the issues with systems having more than
8 CPUs, in a generic way.


Motivation:
The current APIC destination mode ("Flat Logical") used in linux kernel has
an upper limit of 8 CPUs. For more than 8 CPUs, either "Clustered Logical" or
"Physical" mode has to be used.
There is already some code in current kernel (2.4.21-pre1), to support such
conditions. Specifically, IBM Summit, uses Physical mode, and IBM NUMAQ
uses clustered mode. But, there are some issues too:
- it is not generic enough to support any other more than 8 CPU system
out of the box. Supporting different systems may need more hacks in the code.
- clustered mode setup is tightly coupled with NUMA system. Whereas, in reality,
we can as well have logical clusters in a non-NUMA system as well.
- physical mode setup is somewhat tightly coupled with xAPIC. But, xAPIC doesn't
necessarily imply physical mode. You can as well have clustered mode with xAPIC
- APIC destination mode is selected based on particular OEM string.

These reasons together led to panics on other OEM systems with > 8 CPUS. The
patch tries to fix this issue in a generic way (in place of having multiple
hacks for different OEMs). Note, the patch only intends to change the
initialization of systems with more than 8 CPUs and it will not affect
other systems (apart from possible bugs in my code itself).


Description:
The basic idea is to use the number of processors detected on the system, to
decide on which APIC destination mode is to be used. Once all the CPU info, is
collected either from ACPI or MP table, we can check the total number of
processors in the system.
If the number of processors in less than equal to 8,
then no change is required, and we can use the default, "Flat Logical" set up.
If the number of processors is more than 8
we can switch to clustered logical setup.
The logical clusters set up as follows.
Cluster 0 (CPU 0-3), Cluster 1 (CPU 4-7), Cluster 2 (CPU 8-11) and so on..

The other things that are done in the patch include:
- Separate out the NUMA specific stuff from APIC setup in cluster mode. Also,
NUMA has its own way of setting up the clusters, and doesn't follow the
logical cluster mapping defined above.
- Separate out xAPIC stuff from APIC destination setup. And the availability of
xAPIC support can actually be determined from the LAPIC version.
- physical mode support _removed_, as we can use clustered logical setup to
support can support upto a max of 60 CPUs. This is mainly because of the
advantage of being able to setup IOAPICs in LowestPrio, when using clustered mode.

The whole stuff is protected by 'Clustered APIC (> 8 CPUs) support
(CONFIG_X86_APIC_CLUSTER)' config option under Processor Type and Features.
But going forward, we can actually make this as default, as this doesn't
affect the systems with less than equal to 8 CPUs (Apart from minor increase
in code size and couple of additional checks during boot up), but gives the
default support to more than 8 CPU systems.


Please let me know your comments/criticisms about this.
I was able to test this successfully on an 8-way with HT(16 logical)
CPU systems that I have access to. But, I haven't tested it on x440, or NUMAQ
systems. Would love to hear about the effect of this patch on these systems.

Thanks,
-Venkatesh



> -----Original Message-----
> From: Nakajima, Jun
> Sent: Thursday, December 12, 2002 7:06 PM
> To: [email protected]; Zwane Mwaikambo
> Cc: Martin Bligh; John Stultz; Linux Kernel
> Subject: RE: [PATCH][2.5][RFC] Using xAPIC apic address space
> on !Summit
>
>
> BTW, we are working on a xAPIC patch that supports more than
> 8 CPUs in a
> generic fashion (don't use hardcode OEM checking). We already
> tested it on
> two OEM systems with 16 CPUs.
> - It uses clustered mode. We don't want to use physical mode
> because it does
> not support lowest priority delivery mode.
> - We also check the version of the local APIC if it's xAPIC
> or not. It's
> possible that some other x86 architecture (other than P4P) uses xAPIC.
>
> Stay tuned.
>
> Jun


Attachments:
cluster-2.4.21-pre1.patch (25.39 kB)
cluster-2.4.21-pre1.patch

2002-12-18 23:18:46

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs

On Wed, Dec 18, 2002 at 02:36:20PM -0800, Pallipadi, Venkatesh wrote:
> These reasons together led to panics on other OEM systems with > 8 CPUS. The
> patch tries to fix this issue in a generic way (in place of having multiple
> hacks for different OEMs). Note, the patch only intends to change the
> initialization of systems with more than 8 CPUs and it will not affect
> other systems (apart from possible bugs in my code itself).

Any pointers to these systems?

> - Separate out xAPIC stuff from APIC destination setup. And the availability of
> xAPIC support can actually be determined from the LAPIC version.

Are you sure? IIRC some of the early summit boxens didn't report the
right versions..

> - physical mode support _removed_, as we can use clustered logical setup to
> support can support upto a max of 60 CPUs. This is mainly because of the
> advantage of being able to setup IOAPICs in LowestPrio, when using clustered mode.

does this really not break anything in the fragile summit setups?



- bool 'Multi-node NUMA system support' CONFIG_X86_NUMA
- if [ "$CONFIG_X86_NUMA" = "y" ]; then
+ bool 'Clustered APIC (> 8 CPUs) support' CONFIG_X86_APIC_CLUSTER
+ if [ "$CONFIG_X86_APIC_CLUSTER" = "y" ]; then
+ define_bool CONFIG_X86_CLUSTERED_APIC y

Do we really need CONFIG_X86_APIC_CLUSTER _and_ CONFIG_X86_CLUSTERED_APIC?

unsigned long id;
- if(clustered_apic_mode == CLUSTERED_APIC_XAPIC)
- id = physical_to_logical_apicid(hard_smp_processor_id());
+ if(clustered_apic_mode)
+ id = cpu_2_logical_apicid[smp_processor_id()];
else

Okay, this was wrong before, but any chance you could use proper
style here (i.e. if ()

id = 1UL << smp_processor_id();

- if (mp_ioapics[apic].mpc_apicid >= apic_broadcast_id) {
+ if ( !xapic_support &&
+ (mp_ioapics[apic].mpc_apicid >= apic_broadcast_id)) {

if (!xapic_support &&
(mp_ioapics[apic].mpc_apicid >= apic_broadcast_id)) {

+ if ( !xapic_support ) {

Again.

- if (clustered_apic_mode == CLUSTERED_APIC_NUMAQ) {
+ if (clustered_apic_mode &&
+ (configured_platform_type == CONFIGURED_PLATFORM_NUMA) ) {

Doesn;t configured_platform_type == CONFIGURED_PLATFORM_NUMA imply
clustered_apic_mode? and it should be at least CONFIGURED_PLATFORM_NUMAQ,
btw. Probably better something short like SUBARCH_NUMAQ..

Except of that the patch looks fine, but IMHO something like that should
get testing in 2.5 first.

2002-12-18 23:34:01

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs

On Wed, Dec 18, 2002 at 11:26:40PM +0000, Christoph Hellwig wrote:
> Except of that the patch looks fine, but IMHO something like that should
> get testing in 2.5 first.

Yes, I'd prefer this happen in 2.5 first and that I and the rest of
everyone in our lab test the living daylights out of it on NUMA-Q.


Bill

2002-12-18 23:57:34

by Martin J. Bligh

[permalink] [raw]
Subject: Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs

> - if (clustered_apic_mode == CLUSTERED_APIC_NUMAQ) {
> + if (clustered_apic_mode &&
> + (configured_platform_type == CONFIGURED_PLATFORM_NUMA) ) {

Arrrggh - no. Let's not create even more of an unholy mess than is there
already. The above is just vile.

> Except of that the patch looks fine, but IMHO something like that should
> get testing in 2.5 first.

Do it under subarch, in 2.5, and please wait until I merge the NUMA-Q
and Summit support that's working as is first. I'll send it out within
a week.

M.

PS. if people could change the email headers when replying to other
branches of this thread from [email protected] to [email protected],
I'd much appreciate it.

2002-12-19 00:23:14

by Martin J. Bligh

[permalink] [raw]
Subject: Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs

First thing, can you split this into much smaller pieces, each of
which perform one code change ... then it might be more feasible
to read it.

> - bool 'Multi-node NUMA system support' CONFIG_X86_NUMA
> - if [ "$CONFIG_X86_NUMA" = "y" ]; then
> + bool 'Clustered APIC (> 8 CPUs) support' CONFIG_X86_APIC_CLUSTER
> + if [ "$CONFIG_X86_APIC_CLUSTER" = "y" ]; then
> + define_bool CONFIG_X86_CLUSTERED_APIC y
> #Platform Choices
> bool ' Multiquad (IBM/Sequent) NUMAQ support' CONFIG_X86_NUMAQ
> if [ "$CONFIG_X86_NUMAQ" = "y" ]; then
> - define_bool CONFIG_X86_CLUSTERED_APIC y
> - define_bool CONFIG_MULTIQUAD y
> - fi
> - bool ' IBM x440 (Summit/EXA) support' CONFIG_X86_SUMMIT
> - if [ "$CONFIG_X86_SUMMIT" = "y" ]; then
> - define_bool CONFIG_X86_CLUSTERED_APIC y
> + define_bool CONFIG_MULTIQUAD y

You seem to have lost turning on CONFIG_X86_NUMA.

> --- linux-2.4.21-pre1.org/arch/i386/defconfig 2002-11-28 15:53:09.000000000 -0800
> +++ linux-test1/arch/i386/defconfig 2002-12-14 14:59:52.000000000 -0800
> @@ -62,6 +62,7 @@
> # CONFIG_MATH_EMULATION is not set
> # CONFIG_MTRR is not set
> CONFIG_SMP=y
> +CONFIG_X86_APIC_CLUSTER=y
> # CONFIG_MULTIQUAD is not set
> CONFIG_HAVE_DEC_LOCK=y

Errrm ... on by default?

> - if(clustered_apic_mode == CLUSTERED_APIC_XAPIC)
> - id = physical_to_logical_apicid(hard_smp_processor_id());
> + if(clustered_apic_mode)
> + id = cpu_2_logical_apicid[smp_processor_id()];

Don't use those arrays directly, use the macros.
And that was off before for NUMA-Q ... you seem to have turned it on.
Unless you've inverted the meaning of clustered_apic_mode, which is
going to confuse the hell out of everyone?

> - if (clustered_apic_mode != CLUSTERED_APIC_NUMAQ) {
> + if (configured_platform_type != CONFIGURED_PLATFORM_NUMA) {

OK, what exactly are your switching rules here? Before:

if (clustered_apic_mode == CLUSTERED_APIC_NUMAQ) -> numaq only
if (clustered_apic_mode == CLUSTERED_APIC_XAPIC) -> x440
if (clustered_apic_mode) -> numaq or x440

Make sure you match that switching logic in whatever you do.
For instance, this whole section gets skipped for NUMA-Q, but not
other NUMA machines.

> /* Multi-Quad has an extended PCI Conf1 */
> - if(clustered_apic_mode == CLUSTERED_APIC_NUMAQ)
> + if(configured_platform_type == CONFIGURED_PLATFORM_NUMA)

If that's the direct substitution you're trying to make, don't misname
NUMAQ stuff as NUMA - very confusing ...

OK ... I give up trying to read the rest of it until you explain the
switching rules you're trying to use ... perhaps they're just confusingly
named, but it looks all wrong to me ...

M.

2002-12-19 00:57:07

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: RE: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs


I have started working on a similar patch for 2.5. Other thing in my todo list is to
split this patch up into chunks.

Other comments inlined below.

> From: Christoph Hellwig [mailto:[email protected]]
> On Wed, Dec 18, 2002 at 02:36:20PM -0800, Pallipadi, Venkatesh wrote:
> > xAPIC support can actually be determined from the LAPIC version.
>
> Are you sure? IIRC some of the early summit boxens didn't report the
> right versions..
> does this really not break anything in the fragile summit setups?

I am not really sure about the local APIC versions in summit. What I remember seeing on
lkml was summit has older IOAPIC version. Can someone clarify this?

> Okay, this was wrong before, but any chance you could use proper
> style here (i.e. if ()
> Again.

oops.. I somehow missed these 'if' coding style stuff. changing it immediately.

> > + define_bool CONFIG_X86_CLUSTERED_APIC y
> Do we really need CONFIG_X86_APIC_CLUSTER _and_ CONFIG_X86_CLUSTERED_APIC?

I will also eliminate CONFIG_X86_APIC_CLUSTER and use CONFIG_X86_CLUSTERED_APIC directly.

>
> - if (clustered_apic_mode == CLUSTERED_APIC_NUMAQ) {
> + if (clustered_apic_mode &&
> + (configured_platform_type ==
> CONFIGURED_PLATFORM_NUMA) ) {
>
> Doesn;t configured_platform_type == CONFIGURED_PLATFORM_NUMA imply
> clustered_apic_mode? and it should be at least
> CONFIGURED_PLATFORM_NUMAQ,
> btw. Probably better something short like SUBARCH_NUMAQ..

Yes, right now CONFIGURED_PLATFORM_NUMA implies clustered_apic_mode, and one of the
checks in the above 'if' is redundant. Will do a search and replace of NUMA by NUMAQ.


Thanks,
Venkatesh

2002-12-19 01:25:26

by James Cleverdon

[permalink] [raw]
Subject: Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs

On Wednesday 18 December 2002 05:05 pm, Pallipadi, Venkatesh wrote:
> I have started working on a similar patch for 2.5. Other thing in my todo
> list is to split this patch up into chunks.
>
> Other comments inlined below.
>
> > From: Christoph Hellwig [mailto:[email protected]]
> >
> > On Wed, Dec 18, 2002 at 02:36:20PM -0800, Pallipadi, Venkatesh wrote:
> > > xAPIC support can actually be determined from the LAPIC version.
> >
> > Are you sure? IIRC some of the early summit boxens didn't report the
> > right versions..
> > does this really not break anything in the fragile summit setups?
>
> I am not really sure about the local APIC versions in summit. What I
> remember seeing on lkml was summit has older IOAPIC version. Can someone
> clarify this?

Sure, I can verify it. The I/O APICs in shipped summit chipsets contains a
version ID of 0x11 instead of 0x14 to 0x1F. The high performance folks
claimed that Intel specified 0x14 for the local APICs, but left their orange
jacket docs saying 0x1X for I/O APICs until after the chips taped out.

Whatever. In any case, there are boxes in the field that contain those
version numbers. We can recognize them using the OEM and product strings in
the MPS and ACPI tables, so it's only an annoyance.


> > Okay, this was wrong before, but any chance you could use proper
> > style here (i.e. if ()
> > Again.
>
> oops.. I somehow missed these 'if' coding style stuff. changing it
> immediately.
>
> > > + define_bool CONFIG_X86_CLUSTERED_APIC y
> >
> > Do we really need CONFIG_X86_APIC_CLUSTER _and_
> > CONFIG_X86_CLUSTERED_APIC?
>
> I will also eliminate CONFIG_X86_APIC_CLUSTER and use
> CONFIG_X86_CLUSTERED_APIC directly.
>
> > - if (clustered_apic_mode == CLUSTERED_APIC_NUMAQ) {
> > + if (clustered_apic_mode &&
> > + (configured_platform_type ==
> > CONFIGURED_PLATFORM_NUMA) ) {
> >
> > Doesn;t configured_platform_type == CONFIGURED_PLATFORM_NUMA imply
> > clustered_apic_mode? and it should be at least
> > CONFIGURED_PLATFORM_NUMAQ,
> > btw. Probably better something short like SUBARCH_NUMAQ..
>
> Yes, right now CONFIGURED_PLATFORM_NUMA implies clustered_apic_mode, and
> one of the checks in the above 'if' is redundant. Will do a search and
> replace of NUMA by NUMAQ.
>
>
> Thanks,
> Venkatesh

--
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com

2002-12-19 02:27:33

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: RE: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs


The rules that I am trying to follow

numaq summit/ all other
other >8 CPU system systems
------------------------------------------------------------------------------
clustered_apic_mode CLUSTERED CLUSTERED NONE
configured_platform_type NUMAQ NONE NONE
------------------------------------------------------------------------------
Note that in the patch, wherever I said NUMA, I actually meant NUMAQ. I think I lost
that Q, while I was trying to reduce the length of this variable
(CONFIGURED_PLATFORM_NUMAQ) :). Sorry about all the resulting confusion. Doing a
search and replace of NUMA by NUMAQ on my patch right now.

Noticeable changes here are
- summit using CLUSTERED in place of XAPIC(Physical destination).
- use "configured_platform_type" it basically separate out numaq specific stuff
(like, waking up the CPUs through NMI), from the generic cluster apic support.

We are trying to use a common APIC destination mode for all systems with more
than 8 CPUs. This is by having the logical clusters of the CPUs. I am hoping that
this mode works fine on summit. Another option is to allow summit to continue using
physical mode, if there is any binding reason to do so. But anyway NUMAQ specific stuff
has to be separated from cluster APIC stuff.

Rest of the comments inlined below..

> From: Martin J. Bligh [mailto:[email protected]]
> > - define_bool CONFIG_X86_CLUSTERED_APIC y
> > + define_bool CONFIG_MULTIQUAD y
>
> You seem to have lost turning on CONFIG_X86_NUMA.

I dont see CONFIG_X86_NUMA getting used anywhere in 2.4.21-pre1. Am I missing
something here??

> > +CONFIG_X86_APIC_CLUSTER=y
> > # CONFIG_MULTIQUAD is not set
> > CONFIG_HAVE_DEC_LOCK=y
>
> Errrm ... on by default?

I was just trying to be little ambitious :). Will remove that now..

> > - if(clustered_apic_mode == CLUSTERED_APIC_XAPIC)
> > - id =
> physical_to_logical_apicid(hard_smp_processor_id());
> > + if(clustered_apic_mode)
> > + id = cpu_2_logical_apicid[smp_processor_id()];
>
> Don't use those arrays directly, use the macros.

OK. Will change it.

> And that was off before for NUMA-Q ... you seem to have turned it on.
> Unless you've inverted the meaning of clustered_apic_mode, which is
> going to confuse the hell out of everyone?

NO. This check is happening inside calculate_ldr() routine, and NUMAQ never comes
to calculate_ldr(), as (according to the comments), it is the BIOS that programs
LDR in NUMAQ. So only thing we have to worry about in calculate_ldr() is
non-NUMAQ systems.

> > - if (clustered_apic_mode != CLUSTERED_APIC_NUMAQ) {
> > + if (configured_platform_type != CONFIGURED_PLATFORM_NUMA) {
>
> OK, what exactly are your switching rules here? Before:

I am trying to get the stuff which are _only_ specific to NUMAQ, under
"platform type" check. And the stuff specific to cluster APIC setup under
"apic mode" check.
Can you please review the complete patch now. I am not sure whether my explaination
was clear enough. Let me know if have any questions.

Thanks,
-Venkatesh

2002-12-19 02:37:16

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: RE: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs


> From: James Cleverdon [mailto:[email protected]]
> > On Wednesday 18 December 2002 05:05 pm, Pallipadi, Venkatesh wrote:
> > I am not really sure about the local APIC versions in summit. What I
> > remember seeing on lkml was summit has older IOAPIC
> version. Can someone
> > clarify this?
>
> Sure, I can verify it. The I/O APICs in shipped summit
> chipsets contains a
> version ID of 0x11 instead of 0x14 to 0x1F. The high
> performance folks
> claimed that Intel specified 0x14 for the local APICs, but
> left their orange
> jacket docs saying 0x1X for I/O APICs until after the chips taped out.
>
> Whatever. In any case, there are boxes in the field that
> contain those
> version numbers. We can recognize them using the OEM and
> product strings in
> the MPS and ACPI tables, so it's only an annoyance.
>

OK. In my patch I am looking at local APIC version > 0x14, to check xAPIC support.
This should work on all systems irrespective of IOAPIC version.
And even if there are problems here for summit, we can workaround it, by simply
forcing xAPIC support at already existing OEM string check.


Thanks,
-Venkatesh

2002-12-19 03:02:19

by Martin J. Bligh

[permalink] [raw]
Subject: RE: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs

Apologies if I'm somewhat jumpy - bear in mind that I've burnt a lot of
hours
and torn out hair recently breaking up all the Summit patches (which I sent
you, but aren't generally published in their entirety yet). I've also got a
hell of a lot of time invested in making this area of code work as it is
now ;-)
(as have others). Going through another iteration isn't top on my list of
favourite things to do right now ... what you're trying to do does actually
seem
sane in general, I'm just not sure I like the method - probably fairly easy
to
fix.

As a general approach thing, it would be much smoother if you took the path
of trying to make it totally obvious that whatever you're trying to change
doesn't hurt anyone else. Part of that is small easily readable patches,
part of that is choosing your constructs carefully, explaining what you're
doing and why it's safe, and not turning changes on by default ;-)

> numaq summit/ all other
> other >8 CPU system systems
> -------------------------------------------------------------------------
> ----- clustered_apic_mode CLUSTERED CLUSTERED NONE
> configured_platform_type NUMAQ NONE NONE
> -------------------------------------------------------------------------

OK, so you still seem to have a tristate here. What does this gain us over
the existing scheme?

clustered_apic_mode == CLUSTERED_APIC_NUMAQ (equiv CLUSTERED / NUMAQ)
clustered_apic_mode == CLUSTERED_APIC_XAPIC (equiv CLUSTERED / NONE)
clustered_apic_mode == CLUSTERED_APIC_NONE (equiv NONE / NONE)

Or are their other situations you haven't outlined above?

> ----- Note that in the patch, wherever I said NUMA, I actually meant
> NUMAQ. I think I lost that Q, while I was trying to reduce the length of
> this variable (CONFIGURED_PLATFORM_NUMAQ) :). Sorry about all the
> resulting confusion. Doing a search and replace of NUMA by NUMAQ on my
> patch right now.

Cool, that starts to make a little more sense to me now then ...

> Noticeable changes here are
> - summit using CLUSTERED in place of XAPIC(Physical destination).
> - use "configured_platform_type" it basically separate out numaq specific
> stuff (like, waking up the CPUs through NMI), from the generic cluster
> apic support.

Right ... that's all my fault. I started off with clustered_apic_mode as
a simple boolean switch to represent the P3's clustered apic mode ... then
abused it for anything NUMA-Q specific. Stuff that's numaq specific should
be changed to something like "if (x86_numaq)" instead of "if
(clustered_apic_mode == ...)".
If we only have one platform type, it's going to make much more readable
code
to just use a boolean here. Somewhere in a header file (where
clustered_apic_mode
is defined):

#ifdef CONFIG_X86_NUMAQ
#define x86_numaq (1)
#else
#define x86_numaq (0)
#endif

Just makes the resultant c-code easier to read, and shoves all the ifdefs
in a
header file. I used to think that people complaining about ifdefs in code
was
annoying, but having tried to read the results of ifdef hell ... I rapidly
came
to the conclusion they're right ... the whole apic handling code area is
enough to
make your head hurt even if it's as readable as possible ;-)

I can't deny that the current code has a few problems re style and
cleanliness ...
I've been off doing 2.5 for ages, and that will be pretty clean after it's
broken into subarch.

> We are trying to use a common APIC destination mode for all systems with
> more than 8 CPUs. This is by having the logical clusters of the CPUs. I
> am hoping that this mode works fine on summit. Another option is to allow
> summit to continue using physical mode, if there is any binding reason
> to do so. But anyway NUMAQ specific stuff has to be separated from
> cluster APIC stuff.

Look at the 2.5 summit stuff I sent you - I think 2.5 uses logical on
Summit.
Historical split ... don't ask.

>> You seem to have lost turning on CONFIG_X86_NUMA.
>
> I dont see CONFIG_X86_NUMA getting used anywhere in 2.4.21-pre1. Am I
> missing something here??

I think it's used in some patches floating around ... as a general rule,
please don't delete stuff as cleanups at the same time as adding new
features,
the resultant tangle is very hard to verify correct (I have the scars from
trying to untangle such things, and they're fresh and they hurt ;-))

>> Errrm ... on by default?
>
> I was just trying to be little ambitious :). Will remove that now..

Cool ;-)

>> And that was off before for NUMA-Q ... you seem to have turned it on.
>> Unless you've inverted the meaning of clustered_apic_mode, which is
>> going to confuse the hell out of everyone?
>
> NO. This check is happening inside calculate_ldr() routine, and NUMAQ
> never comes to calculate_ldr(), as (according to the comments), it is the
> BIOS that programs LDR in NUMAQ. So only thing we have to worry about in
> calculate_ldr() is non-NUMAQ systems.

I don't have a view in front of me, but the fact there's a
"if (clustered_apic_mode != CLUSTERED_APIC_NUMAQ)" in there right now
makes me suspect it's needed. It's *possible* it's unneeded, but I'm
suspicious.

> I am trying to get the stuff which are _only_ specific to NUMAQ, under
> "platform type" check. And the stuff specific to cluster APIC setup under
> "apic mode" check.

See above ... I'm not sure you need anything this invasive to do what you
want. However, I do have a better idea what you're trying to do now ;-)

> Can you please review the complete patch now. I am not sure whether my
> explaination was clear enough. Let me know if have any questions.

I'll try to go through the updated one when you send it. Hint: smaller
patches with explanations of why they're safe are much easier to read.
Andrew Morton has managed to beat this lesson into me after a while ;-)

M.

2002-12-19 04:06:51

by James Cleverdon

[permalink] [raw]
Subject: Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs

On Wednesday 18 December 2002 06:45 pm, Pallipadi, Venkatesh wrote:
> > From: James Cleverdon [mailto:[email protected]]
> >
> > > On Wednesday 18 December 2002 05:05 pm, Pallipadi, Venkatesh wrote:
> > > I am not really sure about the local APIC versions in summit. What I
> > > remember seeing on lkml was summit has older IOAPIC
> >
> > version. Can someone
> >
> > > clarify this?
> >
> > Sure, I can verify it. The I/O APICs in shipped summit
> > chipsets contains a
> > version ID of 0x11 instead of 0x14 to 0x1F. The high
> > performance folks
> > claimed that Intel specified 0x14 for the local APICs, but
> > left their orange
> > jacket docs saying 0x1X for I/O APICs until after the chips taped out.
> >
> > Whatever. In any case, there are boxes in the field that
> > contain those
> > version numbers. We can recognize them using the OEM and
> > product strings in
> > the MPS and ACPI tables, so it's only an annoyance.
>
> OK. In my patch I am looking at local APIC version > 0x14, to check xAPIC
> support. This should work on all systems irrespective of IOAPIC version.
> And even if there are problems here for summit, we can workaround it, by
> simply forcing xAPIC support at already existing OEM string check.
>
>
> Thanks,
> -Venkatesh

Yes, such a scheme should work fine. (Had something like that in my patch,
but it was cut out to avoid any chance of breaking flat P4 boxen.)

Once you've determined that you have a system with xAPICs, how do you intend
to distinguish between those delivering interrupts through the serial bus vs.
the system bus? (Correct me if I'm wrong, but your patch didn't define the
new I/O APIC register that contains the serial/parallel status bit.)

How will you decide between a box that always must use clustered APIC mode,
like the x440, vs. one which can be operated in flat mode, like the x360? (I
only include the x360 in my summit patch to avoid having all the interrupts
hit CPU 0. Otherwise, it is a flat box that delivers interrupts through the
system bus.) What about other flat P4 systems?

--
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com

2002-12-20 01:57:19

by James Cleverdon

[permalink] [raw]
Subject: Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs

On Wednesday 18 December 2002 02:36 pm, Pallipadi, Venkatesh wrote:
> 2.4.21-pre1 (i386) based patch to fix the issues with systems having more
> than 8 CPUs, in a generic way.
>
>
> Motivation:
> The current APIC destination mode ("Flat Logical") used in linux kernel has
> an upper limit of 8 CPUs. For more than 8 CPUs, either "Clustered Logical"
> or "Physical" mode has to be used.
> There is already some code in current kernel (2.4.21-pre1), to support such
> conditions. Specifically, IBM Summit, uses Physical mode, and IBM NUMAQ
> uses clustered mode. But, there are some issues too:
> - it is not generic enough to support any other more than 8 CPU system
> out of the box. Supporting different systems may need more hacks in the
> code. - clustered mode setup is tightly coupled with NUMA system. Whereas,
> in reality, we can as well have logical clusters in a non-NUMA system as
> well. - physical mode setup is somewhat tightly coupled with xAPIC. But,
> xAPIC doesn't necessarily imply physical mode. You can as well have
> clustered mode with xAPIC - APIC destination mode is selected based on
> particular OEM string.
>
> These reasons together led to panics on other OEM systems with > 8 CPUS.
> The patch tries to fix this issue in a generic way (in place of having
> multiple hacks for different OEMs). Note, the patch only intends to change
> the initialization of systems with more than 8 CPUs and it will not affect
> other systems (apart from possible bugs in my code itself).

Your goals are laudable and, in many ways, I share them. However, this is
2.4. Our goals for 2.4 have always been minimal changes and as little impact
as possible to the UP and flat SMP cases.

> Description:
> The basic idea is to use the number of processors detected on the system,
> to decide on which APIC destination mode is to be used. Once all the CPU
> info, is collected either from ACPI or MP table, we can check the total
> number of processors in the system.
> If the number of processors in less than equal to 8,
> then no change is required, and we can use the default, "Flat Logical"
> set up. If the number of processors is more than 8
> we can switch to clustered logical setup.
> The logical clusters set up as follows.
> Cluster 0 (CPU 0-3), Cluster 1 (CPU 4-7), Cluster 2 (CPU 8-11) and so
> on..
>
> The other things that are done in the patch include:
> - Separate out the NUMA specific stuff from APIC setup in cluster mode.
> Also, NUMA has its own way of setting up the clusters, and doesn't follow
> the logical cluster mapping defined above.
> - Separate out xAPIC stuff from APIC destination setup. And the
> availability of xAPIC support can actually be determined from the LAPIC
> version. - physical mode support _removed_, as we can use clustered logical
> setup to support can support upto a max of 60 CPUs. This is mainly because
> of the advantage of being able to setup IOAPICs in LowestPrio, when using
> clustered mode.

See my 2.5 patches for an example of using solely logical mode interrupts.
The patches submitted to 2.4 are those that have been in Alan's tree since
August and running in SuSE 8.0+ since 8.0's release.

> The whole stuff is protected by 'Clustered APIC (> 8 CPUs) support
> (CONFIG_X86_APIC_CLUSTER)' config option under Processor Type and Features.
> But going forward, we can actually make this as default, as this doesn't
> affect the systems with less than equal to 8 CPUs (Apart from minor
> increase in code size and couple of additional checks during boot up), but
> gives the default support to more than 8 CPU systems.

An x440 needs to use clustered APIC mode whenever more than two physical CPUs
are present. Consider scanning the list of physical APIC IDs to determine
whether clustered mode is necessary. (I had such code in my 2.5 patch, but
ripped it out when it falsely triggered on a couple oddball systems. You may
be able to write a more comprehensive analyzer.)

> Please let me know your comments/criticisms about this.
> I was able to test this successfully on an 8-way with HT(16 logical)
> CPU systems that I have access to. But, I haven't tested it on x440, or
> NUMAQ systems. Would love to hear about the effect of this patch on these
> systems.
>
> Thanks,
> -Venkatesh

A generic patch should also support Unisys' new box, the ES7000 or some such.

> > -----Original Message-----
> > From: Nakajima, Jun
> > Sent: Thursday, December 12, 2002 7:06 PM
> > To: [email protected]; Zwane Mwaikambo
> > Cc: Martin Bligh; John Stultz; Linux Kernel
> > Subject: RE: [PATCH][2.5][RFC] Using xAPIC apic address space
> > on !Summit
> >
> >
> > BTW, we are working on a xAPIC patch that supports more than
> > 8 CPUs in a
> > generic fashion (don't use hardcode OEM checking). We already
> > tested it on
> > two OEM systems with 16 CPUs.
> > - It uses clustered mode. We don't want to use physical mode
> > because it does
> > not support lowest priority delivery mode.
> > - We also check the version of the local APIC if it's xAPIC
> > or not. It's
> > possible that some other x86 architecture (other than P4P) uses xAPIC.
> >
> > Stay tuned.
> >
> > Jun

--
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com

2002-12-20 07:52:54

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs

On Thu, Dec 19, 2002 at 06:04:55PM -0800, James Cleverdon wrote:
> A generic patch should also support Unisys' new box, the ES7000 or some such.

That box needs more changes than just the apic setup. Unfortunately
unisys thinks they shouldn't send their patches to lkml, but when you see
them e.g. in the suse tree it's a bit understandable that they don't want
anyone to really see their mess :)

And btw, the box isn't that new, but three years ago or so when they first
showed it on cebit they even refused to talk about linux due to their
restrictive agreements with Microsoft..

2002-12-20 11:16:47

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs

On Thu, Dec 19, 2002 at 06:04:55PM -0800, James Cleverdon wrote:
>> A generic patch should also support Unisys' new box, the ES7000 or
>> some such.

On Fri, Dec 20, 2002 at 08:00:50AM +0000, Christoph Hellwig wrote:
> That box needs more changes than just the apic setup. Unfortunately
> unisys thinks they shouldn't send their patches to lkml, but when you see
> them e.g. in the suse tree it's a bit understandable that they don't want
> anyone to really see their mess :)
> And btw, the box isn't that new, but three years ago or so when they first
> showed it on cebit they even refused to talk about linux due to their
> restrictive agreements with Microsoft..

Kevin, you're the only lkml-posting contact point I know of within Unisys.
Is there any chance you could flag down some of the ia32 crew there for
some commentary on this stuff? (or do so yourself if you're in it)


Thanks,
Bill

2002-12-20 11:22:08

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs

On Thu, Dec 19, 2002 at 06:04:55PM -0800, James Cleverdon wrote:
>>> A generic patch should also support Unisys' new box, the ES7000 or
>>> some such.

On Fri, Dec 20, 2002 at 08:00:50AM +0000, Christoph Hellwig wrote:
>> That box needs more changes than just the apic setup. Unfortunately
>> unisys thinks they shouldn't send their patches to lkml, but when you see
>> them e.g. in the suse tree it's a bit understandable that they don't want
>> anyone to really see their mess :)
>> And btw, the box isn't that new, but three years ago or so when they first
>> showed it on cebit they even refused to talk about linux due to their
>> restrictive agreements with Microsoft..

On Fri, Dec 20, 2002 at 03:24:01AM -0800, William Lee Irwin III wrote:
> Kevin, you're the only lkml-posting contact point I know of within Unisys.
> Is there any chance you could flag down some of the ia32 crew there for
> some commentary on this stuff? (or do so yourself if you're in it)

Sorry for the meaningless post -- I forgot to add Kevin to the cc: list.


Bill

2002-12-21 03:19:00

by Nakajima, Jun

[permalink] [raw]
Subject: RE: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs

I share the same concerns and comments with Martin.

As far as xAPIC mode is concerned, the changes for ES7000 in SuSe/United Linux are simply activating physical mode. And we are confident the patch we provided should work for the machine as well. Looks like ES7000 requires changes in other areas as well, though.

Since Martin already has code in place in 2.5, we should reuse his code as much as possible. And our current plan is:

For 2.5:

- Martin posts a new patch (that moves IBM-specifc stuff to subarch, for example) next week.
- Venkatesh merges the generic cluster APIC support for systems (with more than 8 CPUs) to it, testing it on some OEM machines (I cannot tell which)

For 2.4:
- Venkatesh will post a confined patch to support APIC physical mode.

Thanks,
Jun


> -----Original Message-----
> From: Martin J. Bligh [mailto:[email protected]]
> Sent: Friday, December 20, 2002 8:30 AM
> To: Van Maren, Kevin; 'William Lee Irwin III'; Christoph Hellwig; James
> Cleverdon; Pallipadi, Venkatesh; Linux Kernel; John Stultz; Nakajima, Jun;
> Mallick, Asit K; Saxena, Sunil
> Cc: Protasevich, Natalie
> Subject: RE: [PATCH][2.4] generic cluster APIC support for systems with m
> ore than 8 CPUs
>
> > Natalie is the engineer who added support for the ES7000 to Linux.
> > Fortunately she is in the cube next to me.
> >
> > She has sent the patches to SuSe/United Linux, and is in the final
> process
> > of testing them on 2.5.5x before submitting them to LKML for comment.
>
> Are they under subarch or somehow abstracted this time, or are there
> going to be 10,000 ifdef's again? If the latter, I can predict what
> the first set of review comments you get are going to be ;-)
>
> M.

2002-12-21 04:36:06

by Martin J. Bligh

[permalink] [raw]
Subject: RE: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs

> I share the same concerns and comments with Martin.
>
> As far as xAPIC mode is concerned, the changes for ES7000 in SuSe/United
> Linux are simply activating physical mode. And we are confident the patch
> we provided should work for the machine as well. Looks like ES7000
> requires changes in other areas as well, though.
>
> Since Martin already has code in place in 2.5, we should reuse his code
> as much as possible. And our current plan is:

I should point out that James wrote most of the Summit code, not me (I did
the original NUMA-Q code) - I'm splitting it out into manageable chunks
and debugging it (it breaks NUMA-Q at the moment, but I think that's fixed
now it's in nice bite-sized pieces).

> For 2.5:
>
> - Martin posts a new patch (that moves IBM-specifc stuff to subarch, for
> example) next week. - Venkatesh merges the generic cluster APIC support
> for systems (with more than 8 CPUs) to it, testing it on some OEM
> machines (I cannot tell which)

Excellent - thankyou for this. When it's abstracted out (as the patches
I'll send out as soon as I've got them tested do), it should be much
easier to merge things together.

> For 2.4:
> - Venkatesh will post a confined patch to support APIC physical mode.

That should be what the current 2.4 summit code uses ... oddly it's
physical in 2.4, and logical in 2.5 ... don't ask why ;-) If you meant
logical, that sounds like a good plan.

M.