Received: by 2002:a25:86ce:0:0:0:0:0 with SMTP id y14csp374591ybm; Mon, 20 May 2019 18:18:32 -0700 (PDT) X-Google-Smtp-Source: APXvYqy6nh7avbFVn2yLXlOUP6DyH12zgvaKT1djVFca0LLI1zE7Oc/eoRLC8VQZ7RZySdmZsQ8j X-Received: by 2002:a17:902:f085:: with SMTP id go5mr70386333plb.53.1558401512172; Mon, 20 May 2019 18:18:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558401512; cv=none; d=google.com; s=arc-20160816; b=bFExcTr1MiX6sF2c6AGqXN7OP5i3989XjEq/q/VV8UDFsMWkXtf6IMhgNkSSNWV1ao oCtYKwmyw6bamdNzE4pBLDOJI98xmehCdqtvYqLKM+xgGoEXxUKvTIjPqKnlQ6TiCOyN sCwbZrH0wzbkmoFELT4+dKam1XrAlWleljKUXge765eYw/Eg33IOS7fEGQHNLM27OvPt ncqly/KUQ2BQS37Lb7k813Q3bos0HtjT3su+mqGAyOTLZ/pohrhbjfLigDZS7sNggvk8 btRMhrlbwTcU8u5TP2JGc0fR83oEtCHJjd/LuAcWVNGkXheUXj8BHt9NbHDl5sWgdmXP 5ycg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:dkim-signature; bh=Bb3i2g5iLhW0rmPyvqaurkiFFVnYGWCbB02MW+Mdyaw=; b=mPVI5CkaQkYNwSRWucACf2RTyWLTyvyDlASxAZ+3SjBrXVRFudUSVJOqBBHhjjCxiT F/JgecRMh88lbyTrus5rC0QVQIBzadeEA3MjoJW7zo5GVlCxIoopmxc6pCBUte9VNA9I X+bNntc+VmxsKcx5Kd1NNrhuGMqiuBNcfU2uRk586xNThrE++JoJ+yro7GJtRMptcHUO U6pae7RTUaQK4wCLCJl5jU3wQDO4+NJdzn8A45TOEeO9v4SMIREnFVIRbmJAnK6kocRP QncsboDfva45shpDDjXxvCAzQGCTAgps+kKWeSrGuFOa1Hm2r+vEJbxZtcK3LQcdwv2I +LsA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@tobin.cc header.s=fm3 header.b=j4LseY1X; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=s4fSs2Sb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j14si20160370pfh.232.2019.05.20.18.18.16; Mon, 20 May 2019 18:18:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@tobin.cc header.s=fm3 header.b=j4LseY1X; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=s4fSs2Sb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727273AbfEUBQL (ORCPT + 99 others); Mon, 20 May 2019 21:16:11 -0400 Received: from new1-smtp.messagingengine.com ([66.111.4.221]:46877 "EHLO new1-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726677AbfEUBQK (ORCPT ); Mon, 20 May 2019 21:16:10 -0400 Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailnew.nyi.internal (Postfix) with ESMTP id 22CAF1553F; Mon, 20 May 2019 21:16:09 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Mon, 20 May 2019 21:16:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tobin.cc; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=fm3; bh=Bb3i2g5iLhW0rmPyvqaurkiFFVn YGWCbB02MW+Mdyaw=; b=j4LseY1XOGAJNMMt7eUBUwX3kkfceM1KRv7Rjk+0nXL VcXZFU216iMv12ehaUrMVquDtpvjclKcLDSKfpoe/e0cwc7XFF0GghXcaakp29ot mX5bN2R8BvJVgEIp9O5o1Xz7rxjmNEPHLIqibb8tY09jU1YNSfLWTce31Q4GS81q tZEoyc9KeoUfurL2X3ZCCeSf7KLfC6CqsVGLUyOKBR+TK6Xq6vXav67S11VtEyPQ +y7PcGR+0IbWi4Ub37UDd78A4+AZLh9FEb8+4bHKqqRB2NmAinU6h0OFvRLMBouJ HEyzBGPHSCCCrXsGsKQTToj9Yynl9yWqCZ2sEJzQAVw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=Bb3i2g 5iLhW0rmPyvqaurkiFFVnYGWCbB02MW+Mdyaw=; b=s4fSs2SbLddlRxt/R2eqDF x4j1USTUxMBO7rKXmhWWp352gttz3T2G4t4mB9jSGRNmkYbcXh+5VvxMdpV5eSWo QmeUmpGNsN5yGDEVF6uosl+huN7/4/ve8JmcKxPzv9NWAJnajMqPkl/hyXjr62jb mQNKC3uSBDxRUaxCUmZLq+uyhfi5JDbwKPwUv7czDUOoj4dBAz59RNADGioyn9UO qaziTaEz4bQIRf87VkAzZmziMQKLvAhSkEcxYTQ7hNhMEdTyhxRHoA8im8tHi97C VqRrtUdwaHK2BYQulVMDv5HDWnmXpuDk4+od6KdH0RYNt1TwrJrLQsFb+lBr2Z9w == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduuddruddtledggeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne gfrhhlucfvnfffucdluddtmdenucfjughrpeffhffvuffkfhggtggujgfofgesthdtredt ofervdenucfhrhhomhepfdfvohgsihhnucevrdcujfgrrhguihhnghdfuceomhgvsehtoh gsihhnrdgttgeqnecukfhppeduvdegrdduieelrdduheeirddvtdefnecurfgrrhgrmhep mhgrihhlfhhrohhmpehmvgesthhosghinhdrtggtnecuvehluhhsthgvrhfuihiivgeptd X-ME-Proxy: Received: from localhost (124-169-156-203.dyn.iinet.net.au [124.169.156.203]) by mail.messagingengine.com (Postfix) with ESMTPA id C3AB68005C; Mon, 20 May 2019 21:16:05 -0400 (EDT) Date: Tue, 21 May 2019 11:15:25 +1000 From: "Tobin C. Harding" To: Roman Gushchin Cc: "Tobin C. Harding" , Andrew Morton , Matthew Wilcox , Alexander Viro , Christoph Hellwig , Pekka Enberg , David Rientjes , Joonsoo Kim , Christopher Lameter , Miklos Szeredi , Andreas Dilger , Waiman Long , Tycho Andersen , Theodore Ts'o , Andi Kleen , David Chinner , Nick Piggin , Rik van Riel , Hugh Dickins , Jonathan Corbet , "linux-mm@kvack.org" , "linux-fsdevel@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: [RFC PATCH v5 04/16] slub: Slab defrag core Message-ID: <20190521011525.GA25898@eros.localdomain> References: <20190520054017.32299-1-tobin@kernel.org> <20190520054017.32299-5-tobin@kernel.org> <20190521005152.GC21811@tower.DHCP.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190521005152.GC21811@tower.DHCP.thefacebook.com> X-Mailer: Mutt 1.11.4 (2019-03-13) User-Agent: Mutt/1.11.4 (2019-03-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 21, 2019 at 12:51:57AM +0000, Roman Gushchin wrote: > On Mon, May 20, 2019 at 03:40:05PM +1000, Tobin C. Harding wrote: > > Internal fragmentation can occur within pages used by the slub > > allocator. Under some workloads large numbers of pages can be used by > > partial slab pages. This under-utilisation is bad simply because it > > wastes memory but also because if the system is under memory pressure > > higher order allocations may become difficult to satisfy. If we can > > defrag slab caches we can alleviate these problems. > > > > Implement Slab Movable Objects in order to defragment slab caches. > > > > Slab defragmentation may occur: > > > > 1. Unconditionally when __kmem_cache_shrink() is called on a slab cache > > by the kernel calling kmem_cache_shrink(). > > > > 2. Unconditionally through the use of the slabinfo command. > > > > slabinfo -s > > > > 3. Conditionally via the use of kmem_cache_defrag() > > > > - Use Slab Movable Objects when shrinking cache. > > > > Currently when the kernel calls kmem_cache_shrink() we curate the > > partial slabs list. If object migration is not enabled for the cache we > > still do this, if however, SMO is enabled we attempt to move objects in > > partially full slabs in order to defragment the cache. Shrink attempts > > to move all objects in order to reduce the cache to a single partial > > slab for each node. > > > > - Add conditional per node defrag via new function: > > > > kmem_defrag_slabs(int node). > > > > kmem_defrag_slabs() attempts to defragment all slab caches for > > node. Defragmentation is done conditionally dependent on MAX_PARTIAL > > _and_ defrag_used_ratio. > > > > Caches are only considered for defragmentation if the number of > > partial slabs exceeds MAX_PARTIAL (per node). > > > > Also, defragmentation only occurs if the usage ratio of the slab is > > lower than the configured percentage (sysfs field added in this > > patch). Fragmentation ratios are measured by calculating the > > percentage of objects in use compared to the total number of objects > > that the slab page can accommodate. > > > > The scanning of slab caches is optimized because the defragmentable > > slabs come first on the list. Thus we can terminate scans on the > > first slab encountered that does not support defragmentation. > > > > kmem_defrag_slabs() takes a node parameter. This can either be -1 if > > defragmentation should be performed on all nodes, or a node number. > > > > Defragmentation may be disabled by setting defrag ratio to 0 > > > > echo 0 > /sys/kernel/slab//defrag_used_ratio > > > > - Add a defrag ratio sysfs field and set it to 30% by default. A limit > > of 30% specifies that more than 3 out of 10 available slots for objects > > need to be in use otherwise slab defragmentation will be attempted on > > the remaining objects. > > > > In order for a cache to be defragmentable the cache must support object > > migration (SMO). Enabling SMO for a cache is done via a call to the > > recently added function: > > > > void kmem_cache_setup_mobility(struct kmem_cache *, > > kmem_cache_isolate_func, > > kmem_cache_migrate_func); > > > > Co-developed-by: Christoph Lameter > > Signed-off-by: Tobin C. Harding > > --- > > Documentation/ABI/testing/sysfs-kernel-slab | 14 + > > include/linux/slab.h | 1 + > > include/linux/slub_def.h | 7 + > > mm/slub.c | 385 ++++++++++++++++---- > > 4 files changed, 334 insertions(+), 73 deletions(-) > > Hi Tobin! > > Overall looks very good to me! I'll take another look when you'll post > a non-RFC version, but so far I can't find any issues. Thanks for the reviews. > A generic question: as I understand, you do support only root kmemcaches now. > Is kmemcg support in plans? I know very little about cgroups, I have no plans for this work. However, I'm not the architect behind this - Christoph is guiding the direction on this one. Perhaps he will comment. > Without it the patchset isn't as attractive to anyone using cgroups, > as it could be. Also, I hope it can solve (or mitigate) the memcg-specific > problem of scattering vfs cache workingset over multiple generations of the > same cgroup (their kmem_caches). I'm keen to work on anything that makes this more useful so I'll do some research. Thanks for the idea. Regards, Tobin.