Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp3674423ybc; Thu, 14 Nov 2019 12:50:50 -0800 (PST) X-Google-Smtp-Source: APXvYqyQf6+9zRYDFpPmFAU+QRa05dKwsAbdIukIg1SynF1bN5aC1L/lLdtGBI/m6H8HH06MZjZm X-Received: by 2002:adf:9161:: with SMTP id j88mr10794138wrj.125.1573764650447; Thu, 14 Nov 2019 12:50:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573764650; cv=none; d=google.com; s=arc-20160816; b=zeKAHIGvmc1oVULsgIMSR59ZK0HLyNONRJDdx3zTM6G56r5tNFFFg1Dlllmu63ykFR AfJGIVflkTeipJ2TFyU0z4c6J1UiCNAdUXtP3D7OPV5oMTh6jcbQwZycHBuwwSxQxgpI F1C9Tk74W6JAUupXPcMFzNVQvq//ERELuosS66LA/fqP1AeVrHiqzz3hPdF1lP/pPcZu 08+ZU3k8f8m5Xy+KAN3gqlXvQT5f77IQnJ492M1LQb9fy4GFTiKl5TAyAmSCMG4Xx2a1 qLrt7T4BRA7S2cG2x2ByZcUlYm/Cur9IqeSThdgog8gH6I+3ab5oogQvyUqWGFX7P2Mv f0cA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=Q5/wJeUtG2i228ZMim6DDkDMTN5pK4sSxcRQmlEUx58=; b=E/oScjKlxd5rUXqT+fUjjcX+sl3BIU7Fswa46UphAtsp5F/gfXNXnag4VNFyyoqZnj 4QC2kOnTETpSh8ZgtEeB/sBEMW1ce3LWf07G7E/l1hI8eUrCACcbKGHNQ2vDOQ/NHMjF QQpKwqQqqgAZCIkP5nMDPwBU+m4iu4zYYPRHQBnrVvbyDDqS+pUXskn/Kdr2PCUuHqaa S/1BMtTUy6kggUusBRoVNlmJLAxxERKAbC5hBpSCMc4bZvT+EtGQ6+rH6JPmMmJt67Ra SUUxl46WIFryUm4561EvUDNvh9br4NlVfm6djqojG4fvnR86zLiJyRyjBsf6tlO8/3ZZ IqjQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 4si6043997edc.4.2019.11.14.12.50.25; Thu, 14 Nov 2019 12:50:50 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727132AbfKNUte (ORCPT + 99 others); Thu, 14 Nov 2019 15:49:34 -0500 Received: from mail105.syd.optusnet.com.au ([211.29.132.249]:40500 "EHLO mail105.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727016AbfKNUtc (ORCPT ); Thu, 14 Nov 2019 15:49:32 -0500 Received: from dread.disaster.area (pa49-181-255-80.pa.nsw.optusnet.com.au [49.181.255.80]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id D942B3A2217; Fri, 15 Nov 2019 07:49:27 +1100 (AEDT) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1iVM3S-0003Bn-PF; Fri, 15 Nov 2019 07:49:26 +1100 Date: Fri, 15 Nov 2019 07:49:26 +1100 From: Dave Chinner To: Brian Foster Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 09/28] mm: directed shrinker work deferral Message-ID: <20191114204926.GC4614@dread.disaster.area> References: <20191031234618.15403-1-david@fromorbit.com> <20191031234618.15403-10-david@fromorbit.com> <20191104152525.GA10665@bfoster> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191104152525.GA10665@bfoster> User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=D+Q3ErZj c=1 sm=1 tr=0 a=XqaD5fcB6dAc7xyKljs8OA==:117 a=XqaD5fcB6dAc7xyKljs8OA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=MeAgGD-zjQ4A:10 a=20KFwNOVAAAA:8 a=7-415B0cAAAA:8 a=pB-_RQp5JTZhIYxYDT0A:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 04, 2019 at 10:25:25AM -0500, Brian Foster wrote: > On Fri, Nov 01, 2019 at 10:45:59AM +1100, Dave Chinner wrote: > > From: Dave Chinner > > > > Introduce a mechanism for ->count_objects() to indicate to the > > shrinker infrastructure that the reclaim context will not allow > > scanning work to be done and so the work it decides is necessary > > needs to be deferred. > > > > This simplifies the code by separating out the accounting of > > deferred work from the actual doing of the work, and allows better > > decisions to be made by the shrinekr control logic on what action it > > can take. > > > > Signed-off-by: Dave Chinner > > --- > > My understanding from the previous discussion(s) is that this is not > tied directly to the gfp mask because that is not the only intended use. > While it is currently a boolean tied to the the entire shrinker call, > the longer term objective is per-object granularity. Longer term, yes, but right now such things are not possible as the shrinker needs more context to be able to make sane per-object decisions. shrinker policy decisions that affect the entire run scope should be handled by the ->count operation - it's the one that says whether the scan loop should run or not, and right now GFP_NOFS for all filesystem shrinkers is a pure boolean policy implementation. The next future step is to provide a superblock context with GFP_NOFS to indicate which filesystem we cannot recurse into. That is also a shrinker instance wide check, so again it's something that ->count should be deciding. i.e. ->count determines what is to be done, ->scan iterates the work that has to be done until we are done. > I find the argument reasonable enough, but if the above is true, why do > we move these checks from ->scan_objects() to ->count_objects() (in the > next patch) when per-object decisions will ultimately need to be made by > the former? Because run/no-run policy belongs in one place, and things like GFP_NOFS do no change across calls to the ->scan loop. i.e. after the first ->scan call in a loop that calls it hundreds to thousands of times, the GFP_NOFS run/no-run check is completely redundant. Once we introduce a new policy that allows the fs shrinker to do careful reclaim in GFP_NOFS conditions, we need to do substantial rework the shrinker scan loop and how it accounts the work that is done - we now have at least 3 or 4 different return counters (skipped because locked, skipped because referenced, reclaimed, deferred reclaim because couldn't lock/recursion) and the accounting and decisions to be made are a lot more complex. In that case, the ->count function will drop the GFP_NOFS check, but still do all the other things is needs to do. The GFP_NOFS check will go deep in the guts of the shrinker scan implementation where the per-object recursion problem exists. But for most shrinkers, it's still going to be a global boolean check... > That seems like unnecessary churn and inconsistent with the > argument against just temporarily doing something like what Christoph > suggested in the previous version, particularly since IIRC the only use > in this series was for gfp mask purposes. If people want to call avoiding repeated, unnecessary evaluation of the same condition hundreds of times instead of once "unnecessary churn", then I'll drop it. > > include/linux/shrinker.h | 7 +++++++ > > mm/vmscan.c | 8 ++++++++ > > 2 files changed, 15 insertions(+) > > > > diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h > > index 0f80123650e2..3405c39ab92c 100644 > > --- a/include/linux/shrinker.h > > +++ b/include/linux/shrinker.h > > @@ -31,6 +31,13 @@ struct shrink_control { > > > > /* current memcg being shrunk (for memcg aware shrinkers) */ > > struct mem_cgroup *memcg; > > + > > + /* > > + * set by ->count_objects if reclaim context prevents reclaim from > > + * occurring. This allows the shrinker to immediately defer all the > > + * work and not even attempt to scan the cache. > > + */ > > + bool defer_work; > > }; > > > > #define SHRINK_STOP (~0UL) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index ee4eecc7e1c2..a215d71d9d4b 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -536,6 +536,13 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, > > trace_mm_shrink_slab_start(shrinker, shrinkctl, nr, > > freeable, delta, total_scan, priority); > > > > + /* > > + * If the shrinker can't run (e.g. due to gfp_mask constraints), then > > + * defer the work to a context that can scan the cache. > > + */ > > + if (shrinkctl->defer_work) > > + goto done; > > + > > I still find the fact that this per-shrinker invocation field is never > reset unnecessarily fragile, and I don't see any good reason not to > reset it prior to the shrinker callback that potentially sets it. I missed that when updating. I'll reset it in the next version. -Dave. -- Dave Chinner david@fromorbit.com