Received: by 10.192.165.156 with SMTP id m28csp1017476imm; Mon, 16 Apr 2018 12:34:16 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/kXzjiIzNguXMbhahmsOGWtka2Kq2jOfSw/b/ryj6b/zd7RLJqBGuajYgPByHXdMD2Yfga X-Received: by 10.99.109.129 with SMTP id i123mr6941876pgc.319.1523907256093; Mon, 16 Apr 2018 12:34:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523907256; cv=none; d=google.com; s=arc-20160816; b=vkdCXZJDE9CBGqL9yoD66NUojCaLOW8cJ0xhDC2+tM3FhxbyhLLUmJyL0+LApYtaWn GqOVm1+PUB3/ZGc0bDrkSe6SGKXxPoMDiEKxnKjZZSs8PG1MRyVbDDz2eU9rud4ILiCx G2eOTCQfmlesPAQ7Ip1pfp4j0EPXL0Y58FFMhpdeTTlPKRALzmCykLMC6XHK02jD8aBF VcCaGQjjkbWU8KSO68Z9Haa1W/Hfvf0MRIZg3dY6qmXYIfgGS1SuhkZmLgf4vpXB2mxT l5hD3h3NMBqcTD7EXj7U+PlRmVeLTvAD9zcChbT+GeUprAbXH0GYLqxmwixqhkjnn1me /msg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date :arc-authentication-results; bh=jQQoSobMbpuaTEQxJ7zKlH/jBX094K2mPoPdzmT6p6c=; b=kWdC6WuSRTUXaPn1znGBK46U7pHXc5liX+7+V2BwJ7Wa7ij4HPxmwP2yfk1B3mOyPt rvvb+bEU/SwoMPVJp5novkg62z4r3IHBUFed8/9dVVIfZ8CBtytALlZsk8p9tYecmVKw G9ZSh66BMJ7Vh0k1i+tYDAi+QVYuSu5TqrkZx8mqmz6wnh2UovZrmC1NiTIwRfYgxb/X 35JtETCsy1hiB27vRIHjw83dUuqybrqbyYDTFSzQo3Siqj1r2cVG5a0IlybprS/OR4zO qm+OVakKieKflyDN8JdLFBKZpcPW3sNUC8kJhB6B99hbO65AZmDc6GszeNYW6DdP+UQ9 GrRg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u9-v6si13108491plz.10.2018.04.16.12.34.01; Mon, 16 Apr 2018 12:34:16 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753459AbeDPTcj (ORCPT + 99 others); Mon, 16 Apr 2018 15:32:39 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:45998 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752308AbeDPTch (ORCPT ); Mon, 16 Apr 2018 15:32:37 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2D3B576FB9; Mon, 16 Apr 2018 19:32:37 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (file01.intranet.prod.int.rdu2.redhat.com [10.11.5.7]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CA91C215CDC8; Mon, 16 Apr 2018 19:32:36 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (localhost [127.0.0.1]) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4) with ESMTP id w3GJWaTp006479; Mon, 16 Apr 2018 15:32:36 -0400 Received: from localhost (mpatocka@localhost) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4/Submit) with ESMTP id w3GJWZmJ006475; Mon, 16 Apr 2018 15:32:35 -0400 X-Authentication-Warning: file01.intranet.prod.int.rdu2.redhat.com: mpatocka owned process doing -bs Date: Mon, 16 Apr 2018 15:32:35 -0400 (EDT) From: Mikulas Patocka X-X-Sender: mpatocka@file01.intranet.prod.int.rdu2.redhat.com To: Mike Snitzer cc: Vlastimil Babka , Christopher Lameter , Matthew Wilcox , Pekka Enberg , linux-mm@kvack.org, dm-devel@redhat.com, David Rientjes , Joonsoo Kim , Andrew Morton , linux-kernel@vger.kernel.org Subject: [PATCH RESEND] slab: introduce the flag SLAB_MINIMIZE_WASTE In-Reply-To: <20180416144638.GA22484@redhat.com> Message-ID: References: <20c58a03-90a8-7e75-5fc7-856facfb6c8a@suse.cz> <20180413151019.GA5660@redhat.com> <20180416142703.GA22422@redhat.com> <20180416144638.GA22484@redhat.com> User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Mon, 16 Apr 2018 19:32:37 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Mon, 16 Apr 2018 19:32:37 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'mpatocka@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch introduces a flag SLAB_MINIMIZE_WASTE for slab and slub. This flag causes allocation of larger slab caches in order to minimize wasted space. This is needed because we want to use dm-bufio for deduplication index and there are existing installations with non-power-of-two block sizes (such as 640KB). The performance of the whole solution depends on efficient memory use, so we must waste as little memory as possible. Signed-off-by: Mikulas Patocka --- drivers/md/dm-bufio.c | 2 +- include/linux/slab.h | 7 +++++++ mm/slab.c | 4 ++-- mm/slab.h | 7 ++++--- mm/slab_common.c | 2 +- mm/slub.c | 25 ++++++++++++++++++++----- 6 files changed, 35 insertions(+), 12 deletions(-) Index: linux-2.6/include/linux/slab.h =================================================================== --- linux-2.6.orig/include/linux/slab.h 2018-04-16 21:10:45.000000000 +0200 +++ linux-2.6/include/linux/slab.h 2018-04-16 21:10:45.000000000 +0200 @@ -108,6 +108,13 @@ #define SLAB_KASAN 0 #endif +/* + * Use higer order allocations to minimize wasted space. + * Note: the allocation is unreliable if this flag is used, the caller + * must handle allocation failures gracefully. + */ +#define SLAB_MINIMIZE_WASTE ((slab_flags_t __force)0x10000000U) + /* The following flags affect the page allocator grouping pages by mobility */ /* Objects are reclaimable */ #define SLAB_RECLAIM_ACCOUNT ((slab_flags_t __force)0x00020000U) Index: linux-2.6/mm/slab_common.c =================================================================== --- linux-2.6.orig/mm/slab_common.c 2018-04-16 21:10:45.000000000 +0200 +++ linux-2.6/mm/slab_common.c 2018-04-16 21:10:45.000000000 +0200 @@ -53,7 +53,7 @@ static DECLARE_WORK(slab_caches_to_rcu_d SLAB_FAILSLAB | SLAB_KASAN) #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \ - SLAB_ACCOUNT) + SLAB_ACCOUNT | SLAB_MINIMIZE_WASTE) /* * Merge control. If this is set then no merging of slab caches will occur. Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2018-04-16 21:10:45.000000000 +0200 +++ linux-2.6/mm/slub.c 2018-04-16 21:12:41.000000000 +0200 @@ -3249,7 +3249,7 @@ static inline unsigned int slab_order(un return order; } -static inline int calculate_order(unsigned int size, unsigned int reserved) +static inline int calculate_order(unsigned int size, unsigned int reserved, slab_flags_t flags) { unsigned int order; unsigned int min_objects; @@ -3277,7 +3277,7 @@ static inline int calculate_order(unsign order = slab_order(size, min_objects, slub_max_order, fraction, reserved); if (order <= slub_max_order) - return order; + goto ret_order; fraction /= 2; } min_objects--; @@ -3289,15 +3289,30 @@ static inline int calculate_order(unsign */ order = slab_order(size, 1, slub_max_order, 1, reserved); if (order <= slub_max_order) - return order; + goto ret_order; /* * Doh this slab cannot be placed using slub_max_order. */ order = slab_order(size, 1, MAX_ORDER, 1, reserved); if (order < MAX_ORDER) - return order; + goto ret_order; return -ENOSYS; + +ret_order: + if (flags & SLAB_MINIMIZE_WASTE) { + /* Increase the order if it decreases waste */ + int test_order; + for (test_order = order + 1; test_order < MAX_ORDER; test_order++) { + unsigned long order_objects = ((PAGE_SIZE << order) - reserved) / size; + unsigned long test_order_objects = ((PAGE_SIZE << test_order) - reserved) / size; + if (test_order_objects >= min(32, MAX_OBJS_PER_PAGE)) + break; + if (test_order_objects > order_objects << (test_order - order)) + order = test_order; + } + } + return order; } static void @@ -3562,7 +3577,7 @@ static int calculate_sizes(struct kmem_c if (forced_order >= 0) order = forced_order; else - order = calculate_order(size, s->reserved); + order = calculate_order(size, s->reserved, flags); if ((int)order < 0) return 0; Index: linux-2.6/drivers/md/dm-bufio.c =================================================================== --- linux-2.6.orig/drivers/md/dm-bufio.c 2018-04-16 21:10:45.000000000 +0200 +++ linux-2.6/drivers/md/dm-bufio.c 2018-04-16 21:11:23.000000000 +0200 @@ -1683,7 +1683,7 @@ struct dm_bufio_client *dm_bufio_client_ (block_size < PAGE_SIZE || !is_power_of_2(block_size))) { snprintf(slab_name, sizeof slab_name, "dm_bufio_cache-%u", c->block_size); c->slab_cache = kmem_cache_create(slab_name, c->block_size, ARCH_KMALLOC_MINALIGN, - SLAB_RECLAIM_ACCOUNT, NULL); + SLAB_RECLAIM_ACCOUNT | SLAB_MINIMIZE_WASTE, NULL); if (!c->slab_cache) { r = -ENOMEM; goto bad; Index: linux-2.6/mm/slab.h =================================================================== --- linux-2.6.orig/mm/slab.h 2018-04-16 21:10:45.000000000 +0200 +++ linux-2.6/mm/slab.h 2018-04-16 21:10:45.000000000 +0200 @@ -142,10 +142,10 @@ static inline slab_flags_t kmem_cache_fl #if defined(CONFIG_SLAB) #define SLAB_CACHE_FLAGS (SLAB_MEM_SPREAD | SLAB_NOLEAKTRACE | \ SLAB_RECLAIM_ACCOUNT | SLAB_TEMPORARY | \ - SLAB_ACCOUNT) + SLAB_ACCOUNT | SLAB_MINIMIZE_WASTE) #elif defined(CONFIG_SLUB) #define SLAB_CACHE_FLAGS (SLAB_NOLEAKTRACE | SLAB_RECLAIM_ACCOUNT | \ - SLAB_TEMPORARY | SLAB_ACCOUNT) + SLAB_TEMPORARY | SLAB_ACCOUNT | SLAB_MINIMIZE_WASTE) #else #define SLAB_CACHE_FLAGS (0) #endif @@ -164,7 +164,8 @@ static inline slab_flags_t kmem_cache_fl SLAB_NOLEAKTRACE | \ SLAB_RECLAIM_ACCOUNT | \ SLAB_TEMPORARY | \ - SLAB_ACCOUNT) + SLAB_ACCOUNT | \ + SLAB_MINIMIZE_WASTE) bool __kmem_cache_empty(struct kmem_cache *); int __kmem_cache_shutdown(struct kmem_cache *); Index: linux-2.6/mm/slab.c =================================================================== --- linux-2.6.orig/mm/slab.c 2018-04-16 21:10:45.000000000 +0200 +++ linux-2.6/mm/slab.c 2018-04-16 21:10:45.000000000 +0200 @@ -1790,14 +1790,14 @@ static size_t calculate_slab_order(struc * as GFP_NOFS and we really don't want to have to be allocating * higher-order pages when we are unable to shrink dcache. */ - if (flags & SLAB_RECLAIM_ACCOUNT) + if (flags & SLAB_RECLAIM_ACCOUNT && !(flags & SLAB_MINIMIZE_WASTE)) break; /* * Large number of objects is good, but very large slabs are * currently bad for the gfp()s. */ - if (gfporder >= slab_max_order) + if (gfporder >= slab_max_order && !(flags & SLAB_MINIMIZE_WASTE)) break; /*