Received: by 2002:a17:90a:c8b:0:0:0:0 with SMTP id v11csp2354607pja; Wed, 10 Apr 2019 18:39:52 -0700 (PDT) X-Google-Smtp-Source: APXvYqw2Ii5W70SAK9aV2c2irhBaYP76ZrjsNY7Q4j1F/gItr0QEyL86FR38951L+R2kaKCIgD71 X-Received: by 2002:a17:902:1124:: with SMTP id d33mr23181941pla.268.1554946792672; Wed, 10 Apr 2019 18:39:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554946792; cv=none; d=google.com; s=arc-20160816; b=wRD/dAB7ei7udG08aH+V8RoGY7RdZRnTW0LGKNbD0nQyfFN9teoZFuMx1eT/BUO4Uc f8Yp+fI2eOOtTrcE3ZOr7CSpTQ+411EmXS2WevpZQ9Ctaqt/b5Azx+Skqh3afbRL0WQO IA50yljzqfnuwCT1v4sl1uYIDg44BkCgSNTySwrQdjKpTcawnbA+pKqOcV1dPKFtATyM saC8deZI1Q2IFYyGz1kFjsrBqfDDw90VsY3/M6c/CRNr7jp/r141zXFsB0Omh4pjsfYe sy7Pfx2iXA+zy3R1m+K3kQ6dUGdib4OFR19OZusY+A9vb828l1Rv/UFFV24pueNrKooF eYag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=aa6iExGIwghZDn1ZohvQXbuCEdzySOldf4bort+1eX4=; b=V99lK434Znz834/KBhg1G724SV40ZqSfTgzGEvXudtmHRVXJNK6NTWw5c/npFyB2ev JG0+kImK6YEzfvRYiZJAFAsmKWikIf7uYBVumOrACXZrUaIUDGo2HhCkbWzBfdCwfB+E fVvkvvFeOs/rd/u2jJnxbiADEw9ycFT8UstRnxsQpQejVIFpiWwss12ZnAy7YHo9MsI1 h+9Tm1Z7KiErBHvfVq5hCSNEr4/WKsUyWpl4RuMWd0Ax20BWyb4GkvxcFWuE2FOx3HZN B027IBHZfItdM7+WNfm97JIURrvaf/bfEpN02XqI+NZfMtogjCxT+Sk899ZI+iWvuaAo M68g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=T2Dnazcb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g59si34834581plb.281.2019.04.10.18.39.37; Wed, 10 Apr 2019 18:39:52 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=T2Dnazcb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727241AbfDKBhV (ORCPT + 99 others); Wed, 10 Apr 2019 21:37:21 -0400 Received: from new4-smtp.messagingengine.com ([66.111.4.230]:35741 "EHLO new4-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726624AbfDKBhT (ORCPT ); Wed, 10 Apr 2019 21:37:19 -0400 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailnew.nyi.internal (Postfix) with ESMTP id 62EFB65C2; Wed, 10 Apr 2019 21:37:18 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Wed, 10 Apr 2019 21:37:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=aa6iExGIwghZDn1ZohvQXbuCEdzySOldf4bort+1eX4=; b=T2Dnazcb /R5hs8UgDc52FqBf3Tc+JxzOrMivrjhEbFPq95OABXHc5Kulcwy1LXhCLmVs4b+v 7vUtrj0ATPPVG/WL35+1azsveq5sj4nYpbPXnTRkv+kicTsjpGYLJN4PSY4zlHBP I+rrACBM3oA+VUHODQLqOM7f1SDTcjwn1P5iLE53yQQw8AV/9DHPbh0ZqB6rO/tG bE7g0co41WG80M51Khuh6o+hVTDuRWosh9TGqM4MjNFh6a5U+itLY4uP5OEtsjwG 6gevfogm5twv+IFJa3rhI66WoYalFJx8EHPfMGx5LQWs7Jn21oYhRHAaE4BGXqJ3 bAL+25a4C0eOGg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduuddrudekgdegvdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefhvffufffkofgjfhgggfestdekredtredttdenucfhrhhomhepfdfvohgsihhn ucevrdcujfgrrhguihhnghdfuceothhosghinheskhgvrhhnvghlrdhorhhgqeenucfkph epuddvgedrudejuddrudelrdduleegnecurfgrrhgrmhepmhgrihhlfhhrohhmpehtohgs ihhnsehkvghrnhgvlhdrohhrghenucevlhhushhtvghrufhiiigvpedt X-ME-Proxy: Received: from eros.localdomain (124-171-19-194.dyn.iinet.net.au [124.171.19.194]) by mail.messagingengine.com (Postfix) with ESMTPA id 6C4C7E409D; Wed, 10 Apr 2019 21:37:10 -0400 (EDT) From: "Tobin C. Harding" To: Andrew Morton Cc: "Tobin C. Harding" , Roman Gushchin , Alexander Viro , Christoph Hellwig , Pekka Enberg , David Rientjes , Joonsoo Kim , Christopher Lameter , Matthew Wilcox , Miklos Szeredi , Andreas Dilger , Waiman Long , Tycho Andersen , "Theodore Ts'o" , Andi Kleen , David Chinner , Nick Piggin , Rik van Riel , Hugh Dickins , Jonathan Corbet , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v3 12/15] slub: Enable balancing slabs across nodes Date: Thu, 11 Apr 2019 11:34:38 +1000 Message-Id: <20190411013441.5415-13-tobin@kernel.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190411013441.5415-1-tobin@kernel.org> References: <20190411013441.5415-1-tobin@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We have just implemented Slab Movable Objects (SMO). On NUMA systems slabs can become unbalanced i.e. many slabs on one node while other nodes have few slabs. Using SMO we can balance the slabs across all the nodes. The algorithm used is as follows: 1. Move all objects to node 0 (this has the effect of defragmenting the cache). 2. Calculate the desired number of slabs for each node (this is done using the approximation nr_slabs / nr_nodes). 3. Loop over the nodes moving the desired number of slabs from node 0 to the node. Feature is conditionally built in with CONFIG_SMO_NODE, this is because we need the full list (we enable SLUB_DEBUG to get this). Future version may separate final list out of SLUB_DEBUG. Expose this functionality to userspace via a sysfs entry. Add sysfs entry: /sysfs/kernel/slab//balance Write of '1' to this file triggers balance, no other value accepted. This feature relies on SMO being enable for the cache, this is done with a call to, after the isolate/migrate functions have been defined. kmem_cache_setup_mobility(s, isolate, migrate) Signed-off-by: Tobin C. Harding --- mm/slub.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 120 insertions(+) diff --git a/mm/slub.c b/mm/slub.c index e4f3dde443f5..a5c48c41d72b 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -4583,6 +4583,109 @@ static unsigned long kmem_cache_move_to_node(struct kmem_cache *s, int node) return left; } + +/* + * kmem_cache_move_slabs() - Attempt to move @num slabs to target_node, + * @s: The cache we are working on. + * @node: The node to move objects from. + * @target_node: The node to move objects to. + * @num: The number of slabs to move. + * + * Attempts to move @num slabs from @node to @target_node. This is done + * by migrating objects from slabs on the full_list. + * + * Return: The number of slabs moved or error code. + */ +static long kmem_cache_move_slabs(struct kmem_cache *s, + int node, int target_node, long num) +{ + struct kmem_cache_node *n = get_node(s, node); + LIST_HEAD(move_list); + struct page *page, *page2; + unsigned long flags; + void **scratch; + long done = 0; + + if (node == target_node) + return -EINVAL; + + scratch = alloc_scratch(s); + if (!scratch) + return -ENOMEM; + + spin_lock_irqsave(&n->list_lock, flags); + list_for_each_entry_safe(page, page2, &n->full, lru) { + if (!slab_trylock(page)) + /* Busy slab. Get out of the way */ + continue; + + list_move(&page->lru, &move_list); + page->frozen = 1; + slab_unlock(page); + + if (++done >= num) + break; + } + spin_unlock_irqrestore(&n->list_lock, flags); + + list_for_each_entry(page, &move_list, lru) { + if (page->inuse) + move_slab_page(page, scratch, target_node); + } + kfree(scratch); + + /* Inspect results and dispose of pages */ + spin_lock_irqsave(&n->list_lock, flags); + list_for_each_entry_safe(page, page2, &move_list, lru) { + list_del(&page->lru); + slab_lock(page); + page->frozen = 0; + + if (page->inuse) { + /* + * This is best effort only, if slab still has + * objects just put it back on the partial list. + */ + n->nr_partial++; + list_add_tail(&page->lru, &n->partial); + slab_unlock(page); + } else { + slab_unlock(page); + discard_slab(s, page); + } + } + spin_unlock_irqrestore(&n->list_lock, flags); + + return done; +} + +/* + * kmem_cache_balance_nodes() - Balance slabs across nodes. + * @s: The cache we are working on. + */ +static void kmem_cache_balance_nodes(struct kmem_cache *s) +{ + struct kmem_cache_node *n = get_node(s, 0); + unsigned long desired_nr_slabs_per_node; + unsigned long nr_slabs; + int nr_nodes = 0; + int nid; + + (void)kmem_cache_move_to_node(s, 0); + + for_each_node_state(nid, N_NORMAL_MEMORY) + nr_nodes++; + + nr_slabs = atomic_long_read(&n->nr_slabs); + desired_nr_slabs_per_node = nr_slabs / nr_nodes; + + for_each_node_state(nid, N_NORMAL_MEMORY) { + if (nid == 0) + continue; + + kmem_cache_move_slabs(s, 0, nid, desired_nr_slabs_per_node); + } +} #endif /** @@ -5847,6 +5950,22 @@ static ssize_t move_store(struct kmem_cache *s, const char *buf, size_t length) return length; } SLAB_ATTR(move); + +static ssize_t balance_show(struct kmem_cache *s, char *buf) +{ + return 0; +} + +static ssize_t balance_store(struct kmem_cache *s, + const char *buf, size_t length) +{ + if (buf[0] == '1') + kmem_cache_balance_nodes(s); + else + return -EINVAL; + return length; +} +SLAB_ATTR(balance); #endif /* CONFIG_SMO_NODE */ #ifdef CONFIG_NUMA @@ -5975,6 +6094,7 @@ static struct attribute *slab_attrs[] = { &shrink_attr.attr, #ifdef CONFIG_SMO_NODE &move_attr.attr, + &balance_attr.attr, #endif &slabs_cpu_partial_attr.attr, #ifdef CONFIG_SLUB_DEBUG -- 2.21.0