To: Thomas Zehetbauer <thomasz@hostmaster.org>
cc: linux-kernel@vger.kernel.org
Subject: Re: NUMA API observations
References: <271SM-3DT-7@gated-at.bofh.it> <27lI4-29E-19@gated-at.bofh.it>
From: Andi Kleen <ak@muc.de>
Date: Tue, 15 Jun 2004 15:27:15 +0200
In-Reply-To: <27lI4-29E-19@gated-at.bofh.it> (Thomas Zehetbauer's message of
 "Tue, 15 Jun 2004 15:00:24 +0200")
Message-ID: <m3wu29kl3g.fsf@averell.firstfloor.org>
User-Agent: Gnus/5.110003 (No Gnus v0.3) Emacs/21.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1642
Lines: 39

Thomas Zehetbauer <thomasz@hostmaster.org> writes:

> Looking at these numastat results and the default policy it seems that
> memory is primarily allocated on the first node which in turn means a
> unnecessarily large amount of page faults on the second node.

NUMA memory policy has nothing to do with page faults.

If you get most allocations on the first node it either means most 
programs run on the first node (assuming they don't use NUMA API
to change their memory affinity) or more likely the programs running
on node 0 need more memory than those running on node 1.

That's easily possible, e.g. a typical desktop uses most of its
memory in the X server. If it runs on node 0 you get such skewed 
statistics. On servers it is often similar.

One way to combat that if it was really a problem would be to run the
X server with interleaving policy (numactl --interleave=all
XFree86)[1], but I would recommend careful benchmarks first if it's
really a win. Normally better local memory latency is the better
choice.

[1] Don't do that with startx or xinit, the rest of the X session should
probably not use that.

> I wonder if it is possible to better balance processes among the nodes
> by e.g. setting nodeAffinity = pid mod nodeCount

I assume you mean scheduling not memory affinity here. execve() and
clone() do that kind of (but based on node loads, not pids), but not fork.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/