Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp2895689imm; Mon, 16 Jul 2018 16:42:43 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdavS69BYtWpuKkhcqCEgS54XMdB2cUPoAsxZlWHwNyKWW/neNULT5LWseqbBFohqKS2cUe X-Received: by 2002:a65:5bc4:: with SMTP id o4-v6mr17081336pgr.448.1531784563927; Mon, 16 Jul 2018 16:42:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531784563; cv=none; d=google.com; s=arc-20160816; b=l2tOrc1S9J2mjYFoJZWzjvvFLdFitALc1/7jh+gVTz3FedhoqBtDcQoITT5zNxmCJS D1VKY3e8ONCsAqend0sKqSyaDNUC/7NXO+S7O/XcYuk+Gbqm+QfoS0jEjVlWPiOezoUV 1QmKAjKOK7qAcZnHVf4EVSBngdCH4dE7PVPvuOvrwv7DQu2hzhgSNckFOgY0n6qhwTUh JVP5FRqeWt8DqTh35VHEoo4H0idn2jJwPutbSrBMHmXbwqljbuYkzil2hUQCpP3qSHfz aZmWs5iIi2kTkG6GRXUH4jZJt4DPOuc4ne+hbjtBvqADx3NiI2R9e5cikgjx9KBV5q4q 3sUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=lyPhZqMTsmfAJqmowr1kwV4ina7L6ESG9Et9BOZfc5U=; b=Qqitq9JZG9/yro6AXKPYxCxd+qSPkZlEhVSxZvDW/5P6a26h8v8/PTxgEBxQmY/YTd k0ELd3gczIs7QXhDzGZjeW2MjQLLeZ9yKJAQQsVdkd6tkHQAT6QMpDAf5EbSiNkf5H71 Os5mKfY6ZLw8TiI+RJHasR5lEfhZYRyA7pSutjN4BpVc1Gem7EoG5VwSuGyyd/9QvFhx hMco+3FA1kzwzmRKoIM3FAgiC/FF3SK8vd2UTtFO8Pr5wc89C2rSXYiKz264awza3rKP py6BbjLsYqnmZx3sr3iotl427bzUTAtOWZiLZv8kWkURsQbkUI+ouvyy2OanVj7CPDlO zt9A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 70-v6si28937369pfz.72.2018.07.16.16.42.28; Mon, 16 Jul 2018 16:42:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729980AbeGQAKR (ORCPT + 99 others); Mon, 16 Jul 2018 20:10:17 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:41978 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729245AbeGQAKQ (ORCPT ); Mon, 16 Jul 2018 20:10:16 -0400 Received: from akpm3.svl.corp.google.com (unknown [104.133.9.92]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id A8628BE1; Mon, 16 Jul 2018 23:40:33 +0000 (UTC) Date: Mon, 16 Jul 2018 16:40:32 -0700 From: Andrew Morton To: Matthew Wilcox Cc: Michal Hocko , Dave Chinner , James Bottomley , Linus Torvalds , Waiman Long , Al Viro , Jonathan Corbet , "Luis R. Rodriguez" , Kees Cook , Linux Kernel Mailing List , linux-fsdevel , linux-mm , "open list:DOCUMENTATION" , Jan Kara , Paul McKenney , Ingo Molnar , Miklos Szeredi , Larry Woodman , "Wangkai (Kevin,C)" Subject: Re: [PATCH v6 0/7] fs/dcache: Track & limit # of negative dentries Message-Id: <20180716164032.94e13f765c5f33c6022eca38@linux-foundation.org> In-Reply-To: <20180716124115.GA7072@bombadil.infradead.org> References: <18c5cbfe-403b-bb2b-1d11-19d324ec6234@redhat.com> <1531336913.3260.18.camel@HansenPartnership.com> <4d49a270-23c9-529f-f544-65508b6b53cc@redhat.com> <1531411494.18255.6.camel@HansenPartnership.com> <20180712164932.GA3475@bombadil.infradead.org> <1531416080.18255.8.camel@HansenPartnership.com> <1531425435.18255.17.camel@HansenPartnership.com> <20180713003614.GW2234@dastard> <20180716090901.GG17280@dhcp22.suse.cz> <20180716124115.GA7072@bombadil.infradead.org> X-Mailer: Sylpheed 3.6.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 16 Jul 2018 05:41:15 -0700 Matthew Wilcox wrote: > On Mon, Jul 16, 2018 at 11:09:01AM +0200, Michal Hocko wrote: > > On Fri 13-07-18 10:36:14, Dave Chinner wrote: > > [...] > > > By limiting the number of negative dentries in this case, internal > > > slab fragmentation is reduced such that reclaim cost never gets out > > > of control. While it appears to "fix" the symptoms, it doesn't > > > address the underlying problem. It is a partial solution at best but > > > at worst it's another opaque knob that nobody knows how or when to > > > tune. > > > > Would it help to put all the negative dentries into its own slab cache? > > Maybe the dcache should be more sensitive to its own needs. In __d_alloc, > it could check whether there are a high proportion of negative dentries > and start recycling some existing negative dentries. Well, yes. The proposed patchset adds all this background reclaiming. Problem is a) that background reclaiming sometimes can't keep up so a synchronous direct-reclaim was added on top and b) reclaiming dentries in the background will cause non-dentry-allocating tasks to suffer because of activity from the dentry-allocating tasks, which is inappropriate. I expect a better design is something like __d_alloc() { ... while (too many dentries) call the dcache shrinker ... } and that's it. This way we have a hard upper limit and only the tasks which are creating dentries suffer the cost. Regarding the slab page fragmentation issue: I'm wondering if the whole idea of balancing the slab scan rates against the page scan rates isn't really working out. Maybe shrink_slab() should be sitting there hammering the caches until they have freed up a particular number of pages. Quite a big change, conceptually and implementationally. Aside: about a billion years ago we were having issues with processes getting stuck in direct reclaim because other processes were coming in and stealing away the pages which the direct-reclaimer had just freed. One possible solution to that was to make direct-reclaiming tasks release the freed pages into a list on the task_struct. So those pages were invisible to other allocating tasks and were available to the direct-reclaimer when it returned from the reclaim effort. I forget what happened to this. It's quite a small code change and would provide a mechanism for implementing the hammer-cache-until-youve-freed-enough design above. Aside 2: if we *do* do something like the above __d_alloc() pseudo code then perhaps it could be cast in terms of pages, not dentries. ie, __d_alloc() { ... while (too many pages in dentry_cache) call the dcache shrinker ... } and, apart from the external name thing (grr), that should address these fragmentation issues, no? I assume it's easy to ask slab how many pages are presently in use for a particular cache.