we don't need copy too. already have x86_cpu_to_node_map
Signed-off-by: Yinghai Lu <[email protected]>
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 1aecc65..69b33e3 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -546,8 +546,6 @@ void __cpuinit numa_set_node(int cpu, int node)
{
int *cpu_to_node_map = x86_cpu_to_node_map_early_ptr;
- cpu_pda(cpu)->nodenumber = node;
-
if(cpu_to_node_map)
cpu_to_node_map[cpu] = node;
else if(per_cpu_offset(cpu))
diff --git a/include/asm-x86/pda.h b/include/asm-x86/pda.h
index c0305bf..d9dc209 100644
--- a/include/asm-x86/pda.h
+++ b/include/asm-x86/pda.h
@@ -22,7 +22,6 @@ struct x8664_pda {
offset 40!!! */
#endif
char *irqstackptr;
- unsigned int nodenumber; /* number of current node */
unsigned int __softirq_pending;
unsigned int __nmi_count; /* number of NMI on this CPUs */
short mmu_state;
On Sat, 16 Feb 2008, Yinghai Lu wrote:
> we don't need copy too. already have x86_cpu_to_node_map
>
> Signed-off-by: Yinghai Lu <[email protected]>
Applied. Thanks,
tglx
Yinghai Lu <[email protected]> writes:
> we don't need copy too. already have x86_cpu_to_node_map
That's a regression (probably from Mike's patches?). Until recently it was
used.
The reason the node number was put in there is that it generates far
shorter code to just fetch the local node number from the PDA than to
first go through a array lookup from the cpu number. It also saves a
costly cache line miss on the array if you're unlucky.
It is far better to fix it than to remove it.
I know Mike/Christoph want to get rid of the PDA and make per cpu data
as efficient as the PDA. If that happens the right fix is to create
a new per CPU data variable for the node number again.
Here's a quick patch (tested on kvm with numa emulation only)
It should be ok because PDA is set up early and
the early node is always 0 and there is a 0 in there
at early boot.
Saves about 1.6k of text on a vmlinux here.
Signed-off-by: Andi Kleen <[email protected]>
Index: linux/include/asm-x86/topology.h
===================================================================
--- linux.orig/include/asm-x86/topology.h
+++ linux/include/asm-x86/topology.h
@@ -34,11 +34,12 @@
extern int cpu_to_node_map[];
#else
+#include <asm/pda.h>
DECLARE_PER_CPU(int, x86_cpu_to_node_map);
extern int x86_cpu_to_node_map_init[];
extern void *x86_cpu_to_node_map_early_ptr;
/* Returns the number of the current Node. */
-#define numa_node_id() (early_cpu_to_node(raw_smp_processor_id()))
+#define numa_node_id() (read_pda(nodenumber))
#endif
extern cpumask_t node_to_cpumask_map[];
On Feb 18, 2008 4:23 AM, Andi Kleen <[email protected]> wrote:
> Yinghai Lu <[email protected]> writes:
>
> > we don't need copy too. already have x86_cpu_to_node_map
>
> That's a regression (probably from Mike's patches?). Until recently it was
> used.
>
> The reason the node number was put in there is that it generates far
> shorter code to just fetch the local node number from the PDA than to
> first go through a array lookup from the cpu number. It also saves a
> costly cache line miss on the array if you're unlucky.
>
> It is far better to fix it than to remove it.
>
> I know Mike/Christoph want to get rid of the PDA and make per cpu data
> as efficient as the PDA. If that happens the right fix is to create
> a new per CPU data variable for the node number again.
>
> Here's a quick patch (tested on kvm with numa emulation only)
>
> It should be ok because PDA is set up early and
> the early node is always 0 and there is a 0 in there
> at early boot.
>
> Saves about 1.6k of text on a vmlinux here.
that is because of the inline...
static inline int early_cpu_to_node(int cpu)
{
int *cpu_to_node_map = x86_cpu_to_node_map_early_ptr;
if (cpu_to_node_map)
return cpu_to_node_map[cpu];
else if (per_cpu_offset(cpu))
return per_cpu(x86_cpu_to_node_map, cpu);
else
return NUMA_NO_NODE;
}
should use per_cpu(x86_cpu_to_node_map, cpu) instead.
and limited using early_cpu_to_node etc.
YH
Andi Kleen wrote:
> Yinghai Lu <[email protected]> writes:
>
>> we don't need copy too. already have x86_cpu_to_node_map
>
> That's a regression (probably from Mike's patches?). Until recently it was
> used.
Yes, I had removed it because I couldn't find any references to it.
And reading one's own percpu variables should be as efficient as
reading one's own pda.
But if I missed something, I apologize. Thanks for catching the problem.
-Mike
>
> The reason the node number was put in there is that it generates far
> shorter code to just fetch the local node number from the PDA than to
> first go through a array lookup from the cpu number. It also saves a
> costly cache line miss on the array if you're unlucky.
>
> It is far better to fix it than to remove it.
>
> I know Mike/Christoph want to get rid of the PDA and make per cpu data
> as efficient as the PDA. If that happens the right fix is to create
> a new per CPU data variable for the node number again.
>
> Here's a quick patch (tested on kvm with numa emulation only)
>
> It should be ok because PDA is set up early and
> the early node is always 0 and there is a 0 in there
> at early boot.
>
> Saves about 1.6k of text on a vmlinux here.
>
> Signed-off-by: Andi Kleen <[email protected]>
>
> Index: linux/include/asm-x86/topology.h
> ===================================================================
> --- linux.orig/include/asm-x86/topology.h
> +++ linux/include/asm-x86/topology.h
> @@ -34,11 +34,12 @@
> extern int cpu_to_node_map[];
>
> #else
> +#include <asm/pda.h>
> DECLARE_PER_CPU(int, x86_cpu_to_node_map);
> extern int x86_cpu_to_node_map_init[];
> extern void *x86_cpu_to_node_map_early_ptr;
> /* Returns the number of the current Node. */
> -#define numa_node_id() (early_cpu_to_node(raw_smp_processor_id()))
> +#define numa_node_id() (read_pda(nodenumber))
> #endif
>
> extern cpumask_t node_to_cpumask_map[];
>
>
On Tue, Feb 19, 2008 at 07:48:54AM -0800, Mike Travis wrote:
> Andi Kleen wrote:
> > Yinghai Lu <[email protected]> writes:
> >
> >> we don't need copy too. already have x86_cpu_to_node_map
> >
> > That's a regression (probably from Mike's patches?). Until recently it was
> > used.
>
> Yes, I had removed it because I couldn't find any references to it.
Hmm, maybe it regressed earlier. Sorry if I blamed you incorrectly.
Anyways at some point this definitely worked. I remember writing
the code ;-)
> And reading one's own percpu variables should be as efficient as
> reading one's own pda.
Sure, but the point is that getting the current node is a common
operation, so it should be a single reference and not go through a
lookup table.
How it is actually implemented -- through PDA or through you new
equivalent per cpu variables -- does not really matter as long
as the resulting code is only a single instruction. Using
the lookup array from the cpu number is wrong.
My patch just fixed it back to use the PDA in the old style for now,
but if all your per cpu stuff is merged (I admit I haven't tracked
if it is or not) replacing that with a per cpu variable would work too
if it then generates the same code as with PDA.
-Andi
Andi Kleen wrote:
> On Tue, Feb 19, 2008 at 07:48:54AM -0800, Mike Travis wrote:
>> Andi Kleen wrote:
>>> Yinghai Lu <[email protected]> writes:
>>>
>>>> we don't need copy too. already have x86_cpu_to_node_map
>>> That's a regression (probably from Mike's patches?). Until recently it was
>>> used.
>> Yes, I had removed it because I couldn't find any references to it.
>
> Hmm, maybe it regressed earlier. Sorry if I blamed you incorrectly.
> Anyways at some point this definitely worked. I remember writing
> the code ;-)
>
>> And reading one's own percpu variables should be as efficient as
>> reading one's own pda.
>
> Sure, but the point is that getting the current node is a common
> operation, so it should be a single reference and not go through a
> lookup table.
>
> How it is actually implemented -- through PDA or through you new
> equivalent per cpu variables -- does not really matter as long
> as the resulting code is only a single instruction. Using
> the lookup array from the cpu number is wrong.
>
> My patch just fixed it back to use the PDA in the old style for now,
> but if all your per cpu stuff is merged (I admit I haven't tracked
> if it is or not) replacing that with a per cpu variable would work too
> if it then generates the same code as with PDA.
>
> -Andi
I'll look at it some more as I don't really have a preference either.
One thing that also bothered me was other cpus read the per cpu
variable to get the node number whilst the current cpu reads the pda
variable. I'll see about resolving that quirky-ness. (All one or all
the other.)
(And of course the problem with cpus on nodes with no local memory
needs to be resolved as well.)
Thanks,
Mike
On Tue, Feb 19, 2008 at 10:32:38AM -0800, Mike Travis wrote:
> Andi Kleen wrote:
> > On Tue, Feb 19, 2008 at 07:48:54AM -0800, Mike Travis wrote:
> >> Andi Kleen wrote:
> >>> Yinghai Lu <[email protected]> writes:
> >>>
> >>>> we don't need copy too. already have x86_cpu_to_node_map
> >>> That's a regression (probably from Mike's patches?). Until recently it was
> >>> used.
> >> Yes, I had removed it because I couldn't find any references to it.
> >
> > Hmm, maybe it regressed earlier. Sorry if I blamed you incorrectly.
> > Anyways at some point this definitely worked. I remember writing
> > the code ;-)
> >
> >> And reading one's own percpu variables should be as efficient as
> >> reading one's own pda.
> >
> > Sure, but the point is that getting the current node is a common
> > operation, so it should be a single reference and not go through a
> > lookup table.
> >
> > How it is actually implemented -- through PDA or through you new
> > equivalent per cpu variables -- does not really matter as long
> > as the resulting code is only a single instruction. Using
> > the lookup array from the cpu number is wrong.
> >
> > My patch just fixed it back to use the PDA in the old style for now,
> > but if all your per cpu stuff is merged (I admit I haven't tracked
> > if it is or not) replacing that with a per cpu variable would work too
> > if it then generates the same code as with PDA.
> >
> > -Andi
>
> I'll look at it some more as I don't really have a preference either.
> One thing that also bothered me was other cpus read the per cpu
> variable to get the node number whilst the current cpu reads the pda
> variable. I'll see about resolving that quirky-ness. (All one or all
> the other.)
This information should always come from the same variable.
> (And of course the problem with cpus on nodes with no local memory
> needs to be resolved as well.)
All CPUs get assigned to some node at boot.
And there should be always per cpu variables or pda to use.
-Andi
Andi Kleen wrote:
>> (And of course the problem with cpus on nodes with no local memory
>> needs to be resolved as well.)
>
> All CPUs get assigned to some node at boot.
>
> And there should be always per cpu variables or pda to use.
Sorry, I was being vague. Mel Gorman found a problem that appears
linked to the change to 16-bit apicids on an AMD box that has 4
nodes, but nodes 1-3 have no memory, and the system panics during
cpu_up() on the 2nd cpu which is on node 1.
I've not yet been able to replicate the problem.
Thanks,
Mike