Date: Thu, 18 May 2006 20:12:07 -0700
From: Paul Jackson <pj@sgi.com>
To: David Chinner <dgc@sgi.com>
Cc: akpm@osdl.org, Simon.Derr@bull.net, linux-kernel@vger.kernel.org,
       nickpiggin@yahoo.com.au, clameter@sgi.com
Subject: Re: [PATCH 01/03] Cpuset: might sleep checking zones allowed fix
Message-Id: <20060518201207.87b6a244.pj@sgi.com>
In-Reply-To: <20060519022144.GT1390195@melbourne.sgi.com>
References: <20060518043556.15898.73616.sendpatchset@jackhammer.engr.sgi.com>
	<20060517222543.600cb20a.akpm@osdl.org>
	<20060518054750.GN1390195@melbourne.sgi.com>
	<20060518174800.f13e2c86.pj@sgi.com>
	<20060519022144.GT1390195@melbourne.sgi.com>
Organization: SGI
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3419
Lines: 83

David wrote:
> Basically, Case B falls back to case A when the cpuset is
> full. So my question really is whether we need to attempt
> to allocaate within the cpuset for GFP_ATOMIC because
> most of the time the local node will be within the cpuset
> anyway....
> 
> So that's what lead to me asking this - is there really a
> noticable distinction between A and B, or is it just
> cluttering up the code with needless complex logic?

Perhaps I'm missing something, but that's what I thought you were
asking, and that's what I tried to answer, in my last post, saying:

pj wrote:
> I suspect we could do this, and it might be a good idea.  There may
> well not be good enough reason to be making a special case of [B] above.


You sound frustrated that I am not understanding your question,
and I am feeling a little frustrated that you don't seem to have
realized that I thought I already recognized and responded to your
question, with an answer sympathetic to your concerns, and a possible
patch to address them.


David wrote:
> Why not simply check this is __cpuset_zone_allowed() and return
> true? We shouldn't put the burden of getting this right on the
> callers when it is something internal to the cpuset workings....

The callers are already conscious of whether or not they can wait.
For all of the callers of cpuset_zone_allowed() except __alloc_pages,
they can very well wait, and such a check is noise.  For __alloc_pages,
it is quite consciously managing what actions it takes based on what
can wait and what can't.

Please see my belts and suspenders metaphor in the previous message.
or the kfree analogy.

In some programming contexts, I add redundancy for robustness, and
in some contexts I minimize redundancy for lean and mean code.  The
kernel tends to be the latter, especially on important code paths.  In
particular, I have spent quite a bit of effort over the last year or
two, reducing to a minimum the number of instructions, cache lines,
locks and conditional jumps imposed on the memory allocation code path
by cpusets.


> > But what do we do if 'wait' is not set, such as when in interrupt or
> > for GFP_ATOMIC requests.  Calling cpuset_zone_allowed() is no longer
> > allowed in that case.
> 
> Sorry, I don't follow why you'd think that this would be not
> allowed. Can you explain this further?

I've probably already written enough words for today on why I suggested
having cpuset_zone_allowed() not be called if it could not wait.

And even if it could be called, say with the "__GFP_WAIT" check you
suggest, there is still the question of what kswapd daemons to wake up
when a memory allocation off an interrupt comes up short of memory on
all nodes.  Picking off the nodes in the interrupted tasks cpuset seems
to be rather arbitrary, at best.


> Why push another wait flag around when there's already one in the
> gfp_mask?

Good point - I was just in the habit of using the local variable
'wait' in the __alloc_pages code, and not re-extracting it from
gfp_mask.


-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/