Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755493AbdGXHBX (ORCPT ); Mon, 24 Jul 2017 03:01:23 -0400 Received: from mail-pf0-f181.google.com ([209.85.192.181]:36627 "EHLO mail-pf0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751832AbdGXHBQ (ORCPT ); Mon, 24 Jul 2017 03:01:16 -0400 Date: Mon, 24 Jul 2017 00:01:13 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Tetsuo Handa cc: hughd@google.com, mhocko@kernel.org, akpm@linux-foundation.org, mgorman@suse.de, riel@redhat.com, hannes@cmpxchg.org, vbabka@suse.cz, linux-mm@kvack.org, linux-kernel@vger.kernel.org, mhocko@suse.com Subject: Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever In-Reply-To: <201707201944.IJI05796.VLFJFFtSQMOOOH@I-love.SAKURA.ne.jp> Message-ID: References: <20170710074842.23175-1-mhocko@kernel.org> <201707201944.IJI05796.VLFJFFtSQMOOOH@I-love.SAKURA.ne.jp> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2822 Lines: 71 On Thu, 20 Jul 2017, Tetsuo Handa wrote: > Hugh Dickins wrote: > > You probably won't welcome getting into alternatives at this late stage; > > but after hacking around it one way or another because of its pointless > > lockups, I lost patience with that too_many_isolated() loop a few months > > back (on realizing the enormous number of pages that may be isolated via > > migrate_pages(2)), and we've been running nicely since with something like: > > > > bool got_mutex = false; > > > > if (unlikely(too_many_isolated(pgdat, file, sc))) { > > if (mutex_lock_killable(&pgdat->too_many_isolated)) > > return SWAP_CLUSTER_MAX; > > got_mutex = true; > > } > > ... > > if (got_mutex) > > mutex_unlock(&pgdat->too_many_isolated); > > > > Using a mutex to provide the intended throttling, without an infinite > > loop or an arbitrary delay; and without having to worry (as we often did) > > about whether those numbers in too_many_isolated() are really appropriate. > > No premature OOMs complained of yet. > > Roughly speaking, there is a moment where shrink_inactive_list() acts > like below. > > bool got_mutex = false; > > if (!current_is_kswapd()) { > if (mutex_lock_killable(&pgdat->too_many_isolated)) > return SWAP_CLUSTER_MAX; > got_mutex = true; > } > > // kswapd is blocked here waiting for !current_is_kswapd(). That would be a shame, for kswapd to wait for !current_is_kswapd()! But seriously, I think I understand what you mean by that, you're thinking that kswapd would be waiting on some other task to clear the too_many_isolated() condition? No, it does not work that way: kswapd (never seeing too_many_isolated() because that always says false when current_is_kswapd()) never tries to take the pgdat->too_many_isolated mutex itself: it does not wait there at all, although other tasks may be waiting there at the time. Perhaps my naming the mutex "too_many_isolated", same as the function, is actually confusing, when I had intended it to be helpful. > > if (got_mutex) > mutex_unlock(&pgdat->too_many_isolated); > > > > > But that was on a different kernel, and there I did have to make sure > > that PF_MEMALLOC always prevented us from nesting: I'm not certain of > > that in the current kernel (but do remember Johannes changing the memcg > > end to make it use PF_MEMALLOC too). I offer the preview above, to see > > if you're interested in that alternative: if you are, then I'll go ahead > > and make it into an actual patch against v4.13-rc. > > I don't know what your actual patch looks like, but the problem is that > pgdat->too_many_isolated waits for kswapd while kswapd waits for > pgdat->too_many_isolated; nobody can unlock pgdat->too_many_isolated if > once we hit it. Not so (and we'd hardly be finding it a useful patch if that were so). Hugh