Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934219AbcKWJSl (ORCPT ); Wed, 23 Nov 2016 04:18:41 -0500 Received: from mx2.suse.de ([195.135.220.15]:54528 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933613AbcKWJSd (ORCPT ); Wed, 23 Nov 2016 04:18:33 -0500 Subject: Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free To: Michal Hocko , Linus Torvalds References: <20161121154336.GD19750@merlins.org> <0d4939f3-869d-6fb8-0914-5f74172f8519@suse.cz> <20161121215639.GF13371@merlins.org> <20161122160629.uzt2u6m75ash4ved@merlins.org> <48061a22-0203-de54-5a44-89773bff1e63@suse.cz> <20161123063410.GB2864@dhcp22.suse.cz> Cc: Marc MERLIN , linux-mm , LKML , Joonsoo Kim , Tejun Heo , Greg Kroah-Hartman From: Vlastimil Babka Message-ID: <5d506912-d2a1-379b-d384-0a48ec5ab707@suse.cz> Date: Wed, 23 Nov 2016 10:18:26 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161123063410.GB2864@dhcp22.suse.cz> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3523 Lines: 75 On 11/23/2016 07:34 AM, Michal Hocko wrote: > On Tue 22-11-16 11:38:47, Linus Torvalds wrote: >> On Tue, Nov 22, 2016 at 8:14 AM, Vlastimil Babka wrote: >>> >>> Thanks a lot for the testing. So what do we do now about 4.8? (4.7 is >>> already EOL AFAICS). >>> >>> - send the patch [1] as 4.8-only stable. >> >> I think that's the right thing to do. It's pretty small, and the >> argument that it changes the oom logic too much is pretty bogus, I >> think. The oom logic in 4.8 is simply broken. Let's get it fixed. >> Changing it is the point. > > The point I've tried to make is that it is not should_reclaim_retry > which is broken. It's an overly optimistic reliance on the compaction > to do it's work which led to all those issues. My previous fix > 31e49bfda184 ("mm, oom: protect !costly allocations some more for > !CONFIG_COMPACTION") tried to cope with that by checking the order-0 > watermark which has proven to help most users. Now it didn't cover > everybody obviously. Rather than fiddling with fine tuning of these > heuristics I think it would be safer to simply admit that high order > OOM detection doesn't work in 4.8 kernel and so do not declare the OOM > killer for those requests at all. The risk of such a change is not big > because there usually are order-0 requests happening all the time so if > we are really OOM we would trigger the OOM eventually. > > So I am proposing this for 4.8 stable tree instead > --- > commit b2ccdcb731b666aa28f86483656c39c5e53828c7 > Author: Michal Hocko > Date: Wed Nov 23 07:26:30 2016 +0100 > > mm, oom: stop pre-mature high-order OOM killer invocations > > 31e49bfda184 ("mm, oom: protect !costly allocations some more for > !CONFIG_COMPACTION") was an attempt to reduce chances of pre-mature OOM > killer invocation for high order requests. It seemed to work for most > users just fine but it is far from bullet proof and obviously not > sufficient for Marc who has reported pre-mature OOM killer invocations > with 4.8 based kernels. 4.9 will all the compaction improvements seems > to be behaving much better but that would be too intrusive to backport > to 4.8 stable kernels. Instead this patch simply never declares OOM for > !costly high order requests. We rely on order-0 requests to do that in > case we are really out of memory. Order-0 requests are much more common > and so a risk of a livelock without any way forward is highly unlikely. > > Reported-by: Marc MERLIN > Signed-off-by: Michal Hocko This should effectively restore the 4.6 logic, so I'm fine with it for stable, if it passes testing. > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index a2214c64ed3c..7401e996009a 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3161,6 +3161,16 @@ should_compact_retry(struct alloc_context *ac, unsigned int order, int alloc_fla > if (!order || order > PAGE_ALLOC_COSTLY_ORDER) > return false; > > +#ifdef CONFIG_COMPACTION > + /* > + * This is a gross workaround to compensate a lack of reliable compaction > + * operation. We cannot simply go OOM with the current state of the compaction > + * code because this can lead to pre mature OOM declaration. > + */ > + if (order <= PAGE_ALLOC_COSTLY_ORDER) > + return true; > +#endif > + > /* > * There are setups with compaction disabled which would prefer to loop > * inside the allocator rather than hit the oom killer prematurely. >