DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns;
	h=date:from:x-x-sender:to:cc:subject:in-reply-to:message-id:
	references:user-agent:mime-version:content-type:x-system-of-record;
	b=PnTruJqzMs9iRjNwWa9YtrgZEd22fgEe5M7BTpkCh8wHVUrF00xoQqixRImv1lnIV
	J5GNgIyzb2ib3oMWeOVtA==
Date: Tue, 2 Jun 2009 01:14:40 -0700 (PDT)
From: David Rientjes <rientjes@google.com>
To: Nick Piggin <npiggin@suse.de>
cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
       Andrew Morton <akpm@linux-foundation.org>,
       Rik van Riel <riel@redhat.com>, Mel Gorman <mel@csn.ul.ie>,
       Christoph Lameter <cl@linux-foundation.org>,
       Dave Hansen <dave@linux.vnet.ibm.com>, linux-kernel@vger.kernel.org
Subject: Re: [patch 3/3 -mmotm] oom: invoke oom killer for __GFP_NOFAIL
In-Reply-To: <20090602075836.GB16201@wotan.suse.de>
Message-ID: <alpine.DEB.2.00.0906020104480.27792@chino.kir.corp.google.com>
References: <alpine.DEB.2.00.0906011828040.6936@chino.kir.corp.google.com> <alpine.DEB.2.00.0906011830490.6936@chino.kir.corp.google.com> <20090601225602.3482cd0d.akpm@linux-foundation.org> <alpine.DEB.2.00.0906020016060.24915@chino.kir.corp.google.com>
 <1243928095.23657.5633.camel@twins> <20090602075836.GB16201@wotan.suse.de>
User-Agent: Alpine 2.00 (DEB 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1708
Lines: 37

On Tue, 2 Jun 2009, Nick Piggin wrote:

> > I would really prefer if we do as Andrew suggests. Both will fix this
> > problem, so I don't see it as a different topic at all.
> 
> Well, his patch, as it stands, is a good one. Because we do have
> potential higher order GFP_NOFAIL.
> 

There's currently an inconsistency in the definition of __GFP_NOFAIL and 
its implementation.  The clearly defined purpose of the flag is:

 * __GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller
 * cannot handle allocation failures.

Yet __GFP_NOFAIL allocations may fail if no progress is made via direct 
reclaim and order > PAGE_ALLOC_COSTLY_ORDER.  That's the behavior in the 
git HEAD and Mel's allocator rework in mmotm.

I've been addressing this implicitly by requiring __GFP_NOFAIL to always 
abide by the definition: we simply can never return NULL because the 
caller can't handle it (and, by definition, shouldn't even be responsible 
for considering it).

With my patch, we kill a memory hogging task that will free some memory so 
the allocation will succeed (or multiple tasks if insufficient contiguous 
memory is available).  Kernel allocations use __GFP_NOFAIL, so the fault 
of this memory freeing is entirely on the caller, not the page allocator.

My preference for handling this is to merge my patch (obviously :), and 
then hopefully deprecate __GFP_NOFAIL as much as possible although I don't 
suspect it could be eradicated forever.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/