Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755107Ab0GMVIg (ORCPT ); Tue, 13 Jul 2010 17:08:36 -0400 Received: from smtp-out.google.com ([216.239.44.51]:27863 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751649Ab0GMVIe (ORCPT ); Tue, 13 Jul 2010 17:08:34 -0400 DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=date:from:x-x-sender:to:cc:subject:in-reply-to:message-id: references:user-agent:mime-version:content-type:x-system-of-record; b=gf+TFkwuXPq9clFpX2Dfd+RIicDZk59Fr8LIUBdQOJAwmHmdESCYCUlYfYym/fSYl 7jya4qx9BDrBDjUTTGH6w== Date: Tue, 13 Jul 2010 14:08:28 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: KOSAKI Motohiro cc: Peter Zijlstra , Michel Lespinasse , linux-mm , LKML , Divyesh Shah , Ingo Molnar Subject: Re: FYI: mmap_sem OOM patch In-Reply-To: <20100713091333.EA3E.A69D9226@jp.fujitsu.com> Message-ID: References: <20100708200324.CD4B.A69D9226@jp.fujitsu.com> <20100713091333.EA3E.A69D9226@jp.fujitsu.com> User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-System-Of-Record: true Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3676 Lines: 71 On Tue, 13 Jul 2010, KOSAKI Motohiro wrote: > > > I disagree. __GFP_NOFAIL mean this allocation failure can makes really > > > dangerous result. Instead, OOM-Killer should try to kill next process. > > > I think. > > > > > > > That's not what happens, __alloc_pages_high_priority() will loop forever > > for __GFP_NOFAIL, the oom killer is never recalled. > > Yup, please reread the discusstion. > There's nothing in the discussion that addresses the fact that __alloc_pages_high_priority() loops infinitely for __GFP_NOFAIL without again calling the oom killer. You may be proposing a change to that when you said "OOM-Killer should try to kill next process. I think," but I can't speculate on that. If you are, please propose a patch. The success of whether we can allocate without watermarks is dependent on where we set them and what types of exclusions we allow. This isn't local only to the page allocator but rather to the entire kernel since GFP_ATOMIC allocations, for example, can deplete the min watermark by ~60%. The remaining memory is what we allow access to for ALLOC_NO_WATERMARKS: those allocations in the direct reclaim patch and those that have been oom killed. Thus, it's important that oom killed tasks do not completely deplete memory reserves, otherwise __GFP_NOFAIL allocations will loop forever without killing additional tasks, as you say. That's sane since oom killing additional tasks wouldn't allow them access to any additional memory, anyway, so the allocation would still fail (we only call the oom killer for those allocations that are retried) and the victim could not exit. With that said, I'm not exactly sure what change you're proposing when you say the oom killer should try to kill another process because it won't lead to any future memory freeing unless the victim can exit without allocating memory. If that's not possible, you've proliferated a ton of innocent kills that were triggered because of one __GFP_NOFAIL attempt. I agree with Peter that it would be ideal to remove __GFP_NOFAIL, but I think we require changes in the retry logic first before that is possible. Right now, we insist on retrying all blockable !__GFP_NORETRY allocations under PAGE_ALLOC_COSTLY_ORDER indefinitely. That, in a sense, is already __GFP_NOFAIL behavior that is implicit: that's why we typically see __GFP_NOFAIL with GFP_NOFS instead. With GFP_NOFS, we never kill the oom killer in the first place, so memory allocation is only more successful on a subsequent attempt by either direct reclaim or memory compaction. There's nothing preventing users from doing do { page = alloc_page(GFP_KERNEL); } while (!page); if the retry logic were reworked to start failing allocations or we removed __GFP_NOFAIL. Thus, I think __GFP_NOFAIL should be substituted with a different gfp flag that acts similar but failable: use compaction, reclaim, and the oom killer where possible and in that order if there is no success and then retry to a high watermark one final time. If the allocation is still unsuccessful, return NULL. This allows users to do what getblk() does, for instance, by implementing their own memory freeing functions. It also allows them to use all of the page allocator's powers (compaction, reclaim, oom killer) without infinitely looping by either using an order under PAGE_ALLOC_COSTLY_ORDER or insisting on __GFP_NOFAIL. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/