To: hannes@cmpxchg.org, mhocko@suse.cz
Cc: david@fromorbit.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
        linux-kernel@vger.kernel.org, torvalds@linux-foundation.org,
        akpm@linux-foundation.org, ying.huang@intel.com, aarcange@redhat.com,
        tytso@mit.edu
Subject: Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
References: <20150326195822.GB28129@dastard>
	<20150327150509.GA21119@cmpxchg.org>
	<20150330003240.GB28621@dastard>
	<20150401151920.GB23824@dhcp22.suse.cz>
	<20150407141822.GA3262@cmpxchg.org>
In-Reply-To: <20150407141822.GA3262@cmpxchg.org>
Message-Id: <201504111629.FIB81218.QStJFFVFOLOMHO@I-love.SAKURA.ne.jp>
Date: Sat, 11 Apr 2015 16:29:26 +0900
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2147
Lines: 47

Johannes Weiner wrote:
> The argument here was always that NOFS allocations are very limited in
> their reclaim powers and will trigger OOM prematurely.  However, the
> way we limit dirty memory these days forces most cache to be clean at
> all times, and direct reclaim in general hasn't been allowed to issue
> page writeback for quite some time.  So these days, NOFS reclaim isn't
> really weaker than regular direct reclaim.  The only exception is that
> it might block writeback, so we'd go OOM if the only reclaimables left
> were dirty pages against that filesystem.  That should be acceptable.
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 47981c5e54c3..fe3cb2b0b85b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2367,16 +2367,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, int alloc_flags,
>  		/* The OOM killer does not needlessly kill tasks for lowmem */
>  		if (ac->high_zoneidx < ZONE_NORMAL)
>  			goto out;
> -		/* The OOM killer does not compensate for IO-less reclaim */
> -		if (!(gfp_mask & __GFP_FS)) {
> -			/*
> -			 * XXX: Page reclaim didn't yield anything,
> -			 * and the OOM killer can't be invoked, but
> -			 * keep looping as per tradition.
> -			 */
> -			*did_some_progress = 1;
> -			goto out;
> -		}
>  		if (pm_suspended_storage())
>  			goto out;
>  		/* The OOM killer may not free memory on a specific node */
> 

I think this change will allow calling out_of_memory() which results in
"oom_kill_process() is trivially called via pagefault_out_of_memory()"
problem described in https://lkml.org/lkml/2015/3/18/219 .

I myself think that we should trigger OOM killer for !__GFP_FS allocation
in order to make forward progress in case the OOM victim is blocked.
So, my question about this change is whether we can accept involving OOM
killer from page fault, no matter how trivially OOM killer will kill some
process?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/