2002-10-05 22:19:52

by Erich Focht

[permalink] [raw]
Subject: [PATCH] topology for ia64

Hi David,

please find attached a first attempt to implement the topology.h
macros/routines for IA64. We need this for the NUMA scheduler setup.

Thanks!
Best regards,
Erich


Attachments:
2.5.39_topology-ia64.patch (4.11 kB)

2002-10-22 00:03:06

by David Mosberger

[permalink] [raw]
Subject: Re: [PATCH] topology for ia64

>>>>> On Sat, 5 Oct 2002 19:04:22 +0200, Erich Focht <[email protected]> said:

Erich> Hi David, please find attached a first attempt to implement
Erich> the topology.h macros/routines for IA64. We need this for the
Erich> NUMA scheduler setup.

Why does the cpu_to_node_map[] exist for non-NUMA configurations? It
seems to me that it would be better to make cpu_to_node_map a macro
that uses an array-check for NUMA configurations and a simple test
against phys_cpu_present_map() for non-NUMA.

--david

2002-10-22 09:17:40

by Erich Focht

[permalink] [raw]
Subject: Re: [PATCH] topology for ia64

On Tuesday 22 October 2002 02:07, David Mosberger wrote:
> Why does the cpu_to_node_map[] exist for non-NUMA configurations? It
> seems to me that it would be better to make cpu_to_node_map a macro
> that uses an array-check for NUMA configurations and a simple test
> against phys_cpu_present_map() for non-NUMA.

Attached is a modified patch for implementing the topology stuff on ia64.
It's on top of your 2.5.39 tree including acpi_numa and the acpi_numa fix
which I've sent you separately.

I dropped the cpu_to_node_map array for the non-NUMA case. The macro
__cpu_to_node() returns 0 in this case. In the places where it is used
(e.g. in the NUMA scheduler) we either run on a valid CPU or have
cpu_online() checks before using it, therefore I also removed the
phys_cpu_present_map check when building the cpu to node map.

Hope this can be included now...

Regards,
Erich


Attachments:
00_topology-ia64-2.5.39-2.patch (4.29 kB)

2002-10-29 22:17:29

by Matthew Dobson

[permalink] [raw]
Subject: Re: [PATCH] topology for ia64

Erich Focht wrote:
> On Tuesday 22 October 2002 02:07, David Mosberger wrote:
>
>>Why does the cpu_to_node_map[] exist for non-NUMA configurations? It
>>seems to me that it would be better to make cpu_to_node_map a macro
>>that uses an array-check for NUMA configurations and a simple test
>>against phys_cpu_present_map() for non-NUMA.
>
> Attached is a modified patch for implementing the topology stuff on ia64.
> It's on top of your 2.5.39 tree including acpi_numa and the acpi_numa fix
> which I've sent you separately.
>
> I dropped the cpu_to_node_map array for the non-NUMA case. The macro
> __cpu_to_node() returns 0 in this case. In the places where it is used
> (e.g. in the NUMA scheduler) we either run on a valid CPU or have
> cpu_online() checks before using it, therefore I also removed the
> phys_cpu_present_map check when building the cpu to node map.

Hi Erich! Apologies for the long response delay... I think our mail
server must be a bit lagged. ;)

It looks good to me. As far as this comment:
+/*
+ * Returns the number of the first CPU on Node 'node'.
+ * Slow in the current implementation.
+ * Who needs this?
+ */
+/* #define __node_to_first_cpu(node) pool_cpus[pool_ptr[node]] */
+static inline int __node_to_first_cpu(int node)

No one is using it now. I think that I will probably deprecate this
function in the near future as it is pretty useless. Anyone looking for
that functionality can just do an __ffs(__node_to_cpu_mask(node))
instead, and hope that there is a reasonably quick implementation of
__node_to_cpu_mask.


> Hope this can be included now...

I agree! Linus or another maintainer, please pick this up. These
macros should be implemented intelligently on as many architectures as
possible, now that they're beginning to be used in more and more places.

Cheers!

-Matt

>
> Regards,
> Erich
>
> <patch snip>
>

2002-10-29 22:33:06

by Michael Hohnbaum

[permalink] [raw]
Subject: Re: [Linux-ia64] Re: [PATCH] topology for ia64

On Tue, 2002-10-29 at 14:19, Matthew Dobson wrote:
> Erich Focht wrote:
> +/*
> + * Returns the number of the first CPU on Node 'node'.
> + * Slow in the current implementation.
> + * Who needs this?
> + */
> +/* #define __node_to_first_cpu(node) pool_cpus[pool_ptr[node]] */
> +static inline int __node_to_first_cpu(int node)
>
> No one is using it now. I think that I will probably deprecate this
> function in the near future as it is pretty useless. Anyone looking for
> that functionality can just do an __ffs(__node_to_cpu_mask(node))
> instead, and hope that there is a reasonably quick implementation of
> __node_to_cpu_mask.
>
I'm using this in the simple NUMA scheduler. This is quite useful
for iterating through a specific node's CPUs. Yes, the functionality
can be obtained in a different manner, but is less obvious.

--

Michael Hohnbaum 503-578-5486
[email protected] T/L 775-5486

2002-10-29 22:41:35

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [Linux-ia64] Re: [PATCH] topology for ia64

On Tue, Oct 29, 2002 at 02:19:25PM -0800, Matthew Dobson wrote:
+/*
+ * Returns the number of the first CPU on Node 'node'.
+ * Slow in the current implementation.
+ * Who needs this?
+ */
+/* #define __node_to_first_cpu(node) pool_cpus[pool_ptr[node]] */
+static inline int __node_to_first_cpu(int node)

So far so safe... though no obvious use of it.


On Tue, Oct 29, 2002 at 02:19:25PM -0800, Matthew Dobson wrote:
> No one is using it now. I think that I will probably deprecate this
> function in the near future as it is pretty useless. Anyone looking for
> that functionality can just do an __ffs(__node_to_cpu_mask(node))
> instead, and hope that there is a reasonably quick implementation of
> __node_to_cpu_mask.

This assumes the value returned by __node_to_cpu_mask() is a single word.


Bill

2002-10-29 23:37:15

by Erich Focht

[permalink] [raw]
Subject: Re: [PATCH] topology for ia64

On Tuesday 29 October 2002 23:19, Matthew Dobson wrote:
> Hi Erich! Apologies for the long response delay... I think our mail
> server must be a bit lagged. ;)

Should I use another email address?

> It looks good to me. As far as this comment:
> +/*
> + * Returns the number of the first CPU on Node 'node'.
> + * Slow in the current implementation.
> + * Who needs this?
> + */
> +/* #define __node_to_first_cpu(node) pool_cpus[pool_ptr[node]] */
> +static inline int __node_to_first_cpu(int node)
>
> No one is using it now. I think that I will probably deprecate this
> function in the near future as it is pretty useless. Anyone looking for
> that functionality can just do an __ffs(__node_to_cpu_mask(node))
> instead, and hope that there is a reasonably quick implementation of
> __node_to_cpu_mask.

Yes, I know of one usage meanwhile. The problem I see is: as far as I
understand the CPUs are not sorted by the node numbers. The NUMA API
doesn't require that, I think. So finding the first CPU in a node is
not really useful for looping over the CPUs of only one node.

When you think of further developments of the NUMA API, I'd suggest
two:

1: Add a sorted list of the nodes and a pointer array into that list
pointing to the first CPU in the node. Like

int node_cpus[NR_CPUS];
int node_first_ptr[MAX_NUMNODES+1];

(or macros, doesn't matter).

Example: 2 nodes:
node_cpus : 0 1 4 5 2 3 6 7
node : 0 0 0 0 1 1 1 1
pointer : ^ ^ ^
=> node_first_ptr[]: 0 4 8

One can initialize this easilly by using the __cpu_to_node()
macro. And with this you can loop over the cpus of one node by doing:

for (i=node_first_ptr[node]; i<node_first_ptr[node+1]; i++) {
cpu = node_cpus[i];
... do stuff with cpu ...
}


2: In ACPI there is a table describing the latency ratios between the
nodes. It is called SLIT (System Locality Information Table). On an 8
node system (NEC TX7) this is a matrix of dimension numnodes*numnodes:

int __node_distance[ 8 * 8] = { 10, 15, 15, 15, 20, 20, 20, 20,
15, 10, 15, 15, 20, 20, 20, 20,
15, 15, 10, 15, 20, 20, 20, 20,
15, 15, 15, 10, 20, 20, 20, 20,
20, 20, 20, 20, 10, 15, 15, 15,
20, 20, 20, 20, 15, 10, 15, 15,
20, 20, 20, 20, 15, 15, 10, 15,
20, 20, 20, 20, 15, 15, 15, 10 };

#define node_distance(i,j) __node_distance[i*8+j]

This means:
node_distance(i,i) = 10 (i==j: same node, lowest latency)
node_distance(i,j) = 15 (i!=j: different node, same supernode)
node_distance(i,j) = 20 (i!=j: different node, different
supernode)

This macro or function describes the NUMA topology of a multi-level
system very well. As the table comes for free with NUMA systems with
ACPI (sooner or later they'll all have this, especially if they want
to run Windows, too), it would be easy to take it into the topology.h.

Regards,
Erich