Message-ID: <433BEED6.6000008@cosmosbay.com>
Date: Thu, 29 Sep 2005 15:40:38 +0200
From: Eric Dumazet <dada1@cosmosbay.com>
User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206)
MIME-Version: 1.0
To: Andi Kleen <ak@suse.de>
CC: linux-kernel@vger.kernel.org
Subject: [NUMA , x86_64] Why memnode_shift is chosen with the lowest possible
 value ?
References: <1127939141.26401.32.camel@localhost.localdomain>	<1127939593.26401.38.camel@localhost.localdomain>	<20050928232027.28e1bb93.akpm@osdl.org> <p73k6h0jjh3.fsf@verdi.suse.de>
In-Reply-To: <p73k6h0jjh3.fsf@verdi.suse.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1746
Lines: 55

Hi Andi

I have a dual Opteron machine, with 8GB of ram on each node.
With latest kernels I have high CPU profiles in mm/slab.c
kfree() is NUMA aware, so far so good, but the price seems heavy.

I noticed in 2.6.14-rc2 syslog :

Node 0 MemBase 0000000000000000 Limit 00000001ffffffff
Node 1 MemBase 0000000200000000 Limit 00000003ffffffff
Using 23 for the hash shift. Max adder is 3ffffffff

instead of previous (2.6.13) :

Node 0 MemBase 0000000000000000 Limit 00000001ffffffff
Node 1 MemBase 0000000200000000 Limit 00000003ffffffff
Using 27 for the hash shift. Max adder is 3ffffffff

After some code review, I see NODEMAPSIZE raised from 0xff to 0xfff

  phys_to_nid() is now reading one byte out of 2048 bytes with 
(memnode_shift=23, units of 8MB).

But shouldnt we try to use the highest possible value for memnode_shift ?

Using memnode_shift=33 would access only 2 bytes from this memnodemap[], 
touching fewer cache lines (well , one cache line). kfree() and friends would 
be slightly faster, at least cache friendly.


Another question is :

Could we add in pda (struct x8664_pda) the node of the cpu ?

We currently do :

#define numa_node_id()             (cpu_to_node(raw_smp_processor_id()))

Instead of reading the processor_id from pda, then access cpu_to_node[], we 
could directly get this information from pda.

#if defined(CONFIG_NUMA)
static inline __attribute_pure__ int numa_node_id() { return read_pda(node);}
#else
#define numa_node_id()             0
#endif

Thank you

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/