Date: Mon, 6 Feb 2006 01:32:27 -0800
From: Paul Jackson <pj@sgi.com>
To: Andrew Morton <akpm@osdl.org>
Cc: dgc@sgi.com, steiner@sgi.com, Simon.Derr@bull.net, ak@suse.de,
       linux-kernel@vger.kernel.org, clameter@sgi.com
Subject: Re: [PATCH 1/5] cpuset memory spread basic implementation
Message-Id: <20060206013227.2407cf8c.pj@sgi.com>
In-Reply-To: <20060205230816.4ae6b6e2.akpm@osdl.org>
References: <20060204071910.10021.8437.sendpatchset@jackhammer.engr.sgi.com>
	<20060205203711.2c855971.akpm@osdl.org>
	<20060205225629.5d887661.pj@sgi.com>
	<20060205230816.4ae6b6e2.akpm@osdl.org>
Organization: SGI
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2446
Lines: 60

Andrew wrote:
> Well I agree.

Good.


> And I think that the only way we'll get peak performance for
> an acceptaly broad range of applications is to provide many fine-grained
> controls and the appropriate documentation and instrumentation to help
> developers and administrators use those controls.
> 
> We're all on the same page here.  I'm questioning whether slab and
> pagecache should be inextricably lumped together though.

They certainly don't need to be lumped.  I just don't go about
creating additional mechanism or apparatus until I smell the need.
(Well, sometimes I do -- too much fun. ;)

When Andrew Morton, who has far more history with this code than I,
recommends such additional mechanism, that's all the smelling I need.

How fine grained would you recommend, Andrew?

Is page vs slab cache the appropriate level of granularity?


> Is it possible to integrate the slab and pagecache allocation policies more
> cleanly into a process's mempolicy?  Right now, MPOL_* are disjoint.
> 
> (Why is the spreading policy part of cpusets at all?  Shouldn't it be part
> of the mempolicy layer?)

The NUMA mempolicy code handles per-task, task internal memory placement
policy, and the cpuset code handles cpuset-wide cpu and memory placement
policy.

In actual usage, spreading the kernel caches of a job is very much a
decision that is made per-job(*), by the system administrator or batch
scheduler, not by the application coder.  The application code may well
be -very- awary of the placement of their data pages in user address
space, and to manage this will use calls such as mbind and
set_mempolicy, in addition to using node-local placement (arranging to
fault in each page from a thread running on the node that is to receive
that page).  The application has no interest in micromanaging the
kernels placement of page and slab caches, other than choosing between
node-local and cpuset spread strategies.

(*) Actually, made per-cpuset, not per-job.  But where this matters,
    that tends to be the same thing.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/