Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751642AbdGZLqq (ORCPT ); Wed, 26 Jul 2017 07:46:46 -0400 Received: from mx2.suse.de ([195.135.220.15]:54048 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751047AbdGZLqp (ORCPT ); Wed, 26 Jul 2017 07:46:45 -0400 Date: Wed, 26 Jul 2017 13:46:39 +0200 From: Michal Hocko To: Tetsuo Handa Cc: linux-mm@kvack.org, hannes@cmpxchg.org, rientjes@google.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH] oom_reaper: close race without using oom_lock Message-ID: <20170726114638.GL2981@dhcp22.suse.cz> References: <20170721150002.GF5944@dhcp22.suse.cz> <201707220018.DAE21384.JQFLVMFHSFtOOO@I-love.SAKURA.ne.jp> <20170721153353.GG5944@dhcp22.suse.cz> <201707230941.BFG30203.OFHSJtFFVQLOMO@I-love.SAKURA.ne.jp> <20170724063844.GA25221@dhcp22.suse.cz> <201707262033.JGE65600.MOtQFFLOJOSFVH@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201707262033.JGE65600.MOtQFFLOJOSFVH@I-love.SAKURA.ne.jp> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2138 Lines: 50 On Wed 26-07-17 20:33:21, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Sun 23-07-17 09:41:50, Tetsuo Handa wrote: > > > So, how can we verify the above race a real problem? > > > > Try to simulate a _real_ workload and see whether we kill more tasks > > than necessary. > > Whether it is a _real_ workload or not cannot become an answer. > > If somebody is trying to allocate hundreds/thousands of pages after memory of > an OOM victim was reaped, avoiding this race window makes no sense; next OOM > victim will be selected anyway. But if somebody is trying to allocate only one > page and then is planning to release a lot of memory, avoiding this race window > can save somebody from being OOM-killed needlessly. This race window depends on > what the threads are about to do, not whether the workload is natural or > artificial. And with a desparate lack of crystal ball we cannot do much about that really. > My question is, how can users know it if somebody was OOM-killed needlessly > by allowing MMF_OOM_SKIP to race. Is it really important to know that the race is due to MMF_OOM_SKIP? Isn't it sufficient to see that we kill too many tasks and then debug it further once something hits that? [...] > Is it guaranteed that __node_reclaim() never (even indirectly) waits for > __GFP_DIRECT_RECLAIM && !__GFP_NORETRY memory allocation? this is a direct reclaim which can go down to slab shrinkers with all the usual fun... > > Such races are unfortunate but > > unavoidable unless we synchronize oom kill with any memory freeing which > > smells like a no-go to me. We can try a last allocation attempt right > > before we go and kill something (which still wouldn't be race free) but > > that might cause other issues - e.g. prolonged trashing without ever > > killing something - but I haven't evaluated those to be honest. > > Yes, postpone last get_page_from_freelist() attempt till oom_kill_process() > will be what we would afford at best. as I've said this would have to be evaluated very carefully and a strong usecase would have to be shown. -- Michal Hocko SUSE Labs