Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754708AbZGAKWZ (ORCPT ); Wed, 1 Jul 2009 06:22:25 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753911AbZGAKWS (ORCPT ); Wed, 1 Jul 2009 06:22:18 -0400 Received: from gir.skynet.ie ([193.1.99.77]:45245 "EHLO gir.skynet.ie" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752731AbZGAKWR (ORCPT ); Wed, 1 Jul 2009 06:22:17 -0400 Date: Wed, 1 Jul 2009 11:22:18 +0100 From: Mel Gorman To: David Rientjes Cc: Andrew Morton , Linus Torvalds , Pekka Enberg , arjan@infradead.org, linux-kernel@vger.kernel.org, Christoph Lameter , Nick Piggin Subject: Re: upcoming kerneloops.org item: get_page_from_freelist Message-ID: <20090701102218.GB16355@csn.ul.ie> References: <20090624130121.99321cca.akpm@linux-foundation.org> <20090624145615.2ff9e56e.akpm@linux-foundation.org> <20090629153007.GD5065@csn.ul.ie> <20090629122029.93cdcc39.akpm@linux-foundation.org> <20090630110011.GB17561@csn.ul.ie> <20090630203249.GC6689@csn.ul.ie> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.17+20080114 (2008-01-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3886 Lines: 92 On Tue, Jun 30, 2009 at 01:51:07PM -0700, David Rientjes wrote: > On Tue, 30 Jun 2009, Mel Gorman wrote: > > > > This will panic the machine if current is the only user thread running or > > > eligible for oom kill (all others could have mm's with OOM_DISABLE set). > > > Currently, such tasks can exit or kthreads can free memory so that the oom > > > is recoverable. > > > > > > > Good point, would the following be ok instead? > > > > + if (test_tsk_thread_flag(p, TIF_MEMDIE)) { > > + if (p == current) { > > + chosen = ERR_PTR(-1UL); > > + continue; > > + } else > > + return ERR_PTR(-1UL); > > > > Yes, if you wanted to allow multiple threads to have TIF_MEMDIE set. > While it would be possible, I'm thinking that it would be unlikely for multiple TIF_MEMDIE processes to exist like this unless they were all __GFP_NOFAIL. For a second thread to be set TIF_MEMDIE, one process needs to OOM-kill, select itself, fail the next allocation and OOM-kill again. For it to be looping, each thread would also need __GFP_NOFAIL so while it's possible for multiple threads to get TIF_MEMDIE, it's exceedingly difficult and the system needs to be in a bad state. Do you agree that the following hunk makes sense at least? + /* Do not loop if OOM-killed unless __GFP_NOFAIL is specified */ + if (test_thread_flag(TIF_MEMDIE)) { + if (gfp_mask & __GFP_NOFAIL) + WARN(1, "Potential infinite loop with __GFP_NOFAIL"); + else + return 0; + } With that on its own, TIF_MEMDIE processes will exit the page allocator unless __GFP_NOFAIL set and warn when the potential infinite loop is happening. In combination with your patch to recalculate alloc_flags after the OOM killer, I think TIF_MEMDIE would be much closer to behaving sensibly. Then a patch that addresses potential-infinite-loop-TIF_MEMDIE-__GFP_NOFAIL can be considered. > > > The problem with this approach is that it increases the liklihood that > > > memory reserves will be totally depleted when several threads are > > > competing for them. > > > > > > > How so? > > > > We automatically oom kill current if it's PF_EXITING to give it TIF_MEMDIE > because we know it's on the exit path, this avoids allowing them to > allocate below the min watermark for the allocation that triggered the > oom, which could be significant. > Ok, your patch is important for this situation. > If several threads are ooming at the same time, which happens quite often > for non-cpuset and non-memcg constrained ooms (and those not restricted to > lowmem), we could now potentially have nr_cpus threads with TIF_MEMDIE set > simultaneously, which increases the liklihood that memory reserves will be > fully depleted after each allocation that triggered the oom killer now > succeeds because of ALLOC_NO_WATERMARKS. This is somewhat mitigated by > the oom killer serialization done on the zonelist, but nothing guarantees > that reserves aren't empty before one of the threads has detached its > ->mm. > While I think this situation is exceedingly unlikely to occur because of the required combination of GFP flags and memory pressure, I see your point. > oom_kill_task() SIGKILLs threads sharing ->mm with different tgid's > instead of giving them access to memory reserves specifically to avoid > this. > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/