Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp4477793rdh; Wed, 29 Nov 2023 02:37:22 -0800 (PST) X-Google-Smtp-Source: AGHT+IGWUG0bnjAeQoYoqYux1FKXfoRGJdydNRlMM6wvcSacK5qHNgQVSr+cghVpdRMYx2FGaQHj X-Received: by 2002:a17:90b:4c05:b0:285:9d5d:709d with SMTP id na5-20020a17090b4c0500b002859d5d709dmr15623159pjb.49.1701254241965; Wed, 29 Nov 2023 02:37:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701254241; cv=none; d=google.com; s=arc-20160816; b=mnyiXrb9j2GeH3qLjm/lSsmITHalvciwxM/udH9Pn+8yiL42HlQDs6AeII2+2JXLgj E710QuRawLVm8k1rhsH8fw0c4Gic8pu/8hNKyjDBDoL8KJm7/lYsIWask2IZu/gnUs3Z TJi+Yy4xruMceUTYkeQXbEhKWhBhglwTauJSaV0lcaKK16fmtBW5rSzH/FG+/kUtTs5h 51upPTfz+L11Y1L/TVu+LEdCzl+PcWkKY0wWxmT20m+Om0cdzpue3mO1gJ56Dc5AWl2D FPt6/f9Fj2MTBckRxiOQf6JXLAan0aXYi//FsDo0iL0XqwDEW26XsHJFpecZg2QTybfT A9Lw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=iMrQed01JJcN4xQKdeWzhemwVY/MGHkgvTWPoO+Z2wQ=; fh=BCNYuPqol1vLzLo/yJnM/ouRZane89kp7ezRxw3G5Zc=; b=ov8cwBazERhLE759RUkHsc/TjNLBwRAmvlggi60tve3ANb4et+dSDNW71Cd5HdKnR9 natuxYdmAICy/kOeUHyuucaoEdjQuR9LKDswOIkQpSKPCcwnE1FC4NMFhTRY2XqTaEkS Hz0HuuGDhflrXhFDyeg8eD5ApG4cPIbEZ1/chB5zzHM6vN5cksYx7IPhHQOL+ecIxNoI Wk2l6p/FimuboKgNUcUCkaDX1K37FJFDLakJVIyDksjDWGUfpqYdSsfzdvZu1zuPoBnV wDYwCAvsE/uVWG2W5Iw/FwWJNc/qNIjQ+wsnKfD13hahWKDSoxvqbrv/0BDjoyLt7SRK 4z4g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=SGzb55rf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id d1-20020a17090a6f0100b00285b937aebdsi1012657pjk.74.2023.11.29.02.37.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Nov 2023 02:37:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=SGzb55rf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 1F657803BA93; Wed, 29 Nov 2023 02:37:19 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232190AbjK2Kgu (ORCPT + 99 others); Wed, 29 Nov 2023 05:36:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37560 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229509AbjK2Kgb (ORCPT ); Wed, 29 Nov 2023 05:36:31 -0500 Received: from mail-vs1-xe34.google.com (mail-vs1-xe34.google.com [IPv6:2607:f8b0:4864:20::e34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D198A2123 for ; Wed, 29 Nov 2023 02:35:54 -0800 (PST) Received: by mail-vs1-xe34.google.com with SMTP id ada2fe7eead31-4644db3b384so89273137.3 for ; Wed, 29 Nov 2023 02:35:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701254154; x=1701858954; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=iMrQed01JJcN4xQKdeWzhemwVY/MGHkgvTWPoO+Z2wQ=; b=SGzb55rfa7L990KgU9b9myGKp/rrR1Pw5F6BK670UJKjfKVoWRqBfBcWbbXQhs8cxa BN0leHOABFmqBVSnPsQ8q/pk6A8INh4JR532HdP/BaWzJ/2/fMhZH5OeVzfc/QvS/reO i8vRmYqc3EHo4LRaa8QEal9/vhShvv6RhEp2+XFN9ikZhyUjXW9xzK1LjsxvH7MXZnjq eiqnmZT6Buy2NmaHOMli4NYuYSQE3bQvbnF7Zr2RGlKf7O16CbxrwV3moY7F6/P50sGm S6/2ASfmI0S0EomNF3YIJNSHsPqCEiSP68uyHwOSFcTvqUmjlw2aaREmJIvSjgXt8U7M dkZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701254154; x=1701858954; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=iMrQed01JJcN4xQKdeWzhemwVY/MGHkgvTWPoO+Z2wQ=; b=tYP80fGt9JX3n+4SV/IS421JRPMpzHGNlY+GANDL4IR6VKHQRbhV6xmNXr/AUPHo1z BUKrg5feOrNthFtuOCPimCy2QE2a//Fcy135ZksRJ39yhCPcDv9TTk+hu7NOgNeAeSe9 SZJR6J1nAlqOVp4BzjVXzvsAHysZPsze2shs3muVIbDtAkYCVUJ702PZ+rWwRlI+Evqf bbBp6FBq+HFdYWuTgsLZHS36npq4d00e+d9bTT2IIhBz6dbP5zWMv24ZUf0oaZ2BNkRb wuxU9iPw/Fu8HTMiHP/0Mv/ExjMczEw2MYIcERzYwWjUrk5XK2xYd4veYJ/PbjXZPxK6 E+Mw== X-Gm-Message-State: AOJu0YwqkSWjNTLJtcnFi0xiW2gLmnW0WlgKtUbLKY1SFu77zM05SH80 ge+sXKeOgP9CWwJ6Fk9leOJmjL39+tLx8MCxb699cw== X-Received: by 2002:a67:fb15:0:b0:464:408a:5d87 with SMTP id d21-20020a67fb15000000b00464408a5d87mr3293331vsr.33.1701254153678; Wed, 29 Nov 2023 02:35:53 -0800 (PST) MIME-Version: 1.0 References: <20231129-slub-percpu-caches-v3-0-6bcf536772bc@suse.cz> <20231129-slub-percpu-caches-v3-5-6bcf536772bc@suse.cz> In-Reply-To: <20231129-slub-percpu-caches-v3-5-6bcf536772bc@suse.cz> From: Marco Elver Date: Wed, 29 Nov 2023 11:35:15 +0100 Message-ID: Subject: Re: [PATCH RFC v3 5/9] mm/slub: add opt-in percpu array cache of objects To: Vlastimil Babka Cc: Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Matthew Wilcox , "Liam R. Howlett" , Andrew Morton , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Alexander Potapenko , Dmitry Vyukov , linux-mm@kvack.org, linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org, kasan-dev@googlegroups.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Wed, 29 Nov 2023 02:37:19 -0800 (PST) On Wed, 29 Nov 2023 at 10:53, Vlastimil Babka wrote: > > kmem_cache_setup_percpu_array() will allocate a per-cpu array for > caching alloc/free objects of given size for the cache. The cache > has to be created with SLAB_NO_MERGE flag. > > When empty, half of the array is filled by an internal bulk alloc > operation. When full, half of the array is flushed by an internal bulk > free operation. > > The array does not distinguish NUMA locality of the cached objects. If > an allocation is requested with kmem_cache_alloc_node() with numa node > not equal to NUMA_NO_NODE, the array is bypassed. > > The bulk operations exposed to slab users also try to utilize the array > when possible, but leave the array empty or full and use the bulk > alloc/free only to finish the operation itself. If kmemcg is enabled and > active, bulk freeing skips the array completely as it would be less > efficient to use it. > > The locking scheme is copied from the page allocator's pcplists, based > on embedded spin locks. Interrupts are not disabled, only preemption > (cpu migration on RT). Trylock is attempted to avoid deadlock due to an > interrupt; trylock failure means the array is bypassed. > > Sysfs stat counters alloc_cpu_cache and free_cpu_cache count objects > allocated or freed using the percpu array; counters cpu_cache_refill and > cpu_cache_flush count objects refilled or flushed form the array. > > kmem_cache_prefill_percpu_array() can be called to ensure the array on > the current cpu to at least the given number of objects. However this is > only opportunistic as there's no cpu pinning between the prefill and > usage, and trylocks may fail when the usage is in an irq handler. > Therefore allocations cannot rely on the array for success even after > the prefill. But misses should be rare enough that e.g. GFP_ATOMIC > allocations should be acceptable after the refill. > > When slub_debug is enabled for a cache with percpu array, the objects in > the array are considered as allocated from the slub_debug perspective, > and the alloc/free debugging hooks occur when moving the objects between > the array and slab pages. This means that e.g. an use-after-free that > occurs for an object cached in the array is undetected. Collected > alloc/free stacktraces might also be less useful. This limitation could > be changed in the future. > > On the other hand, KASAN, kmemcg and other hooks are executed on actual > allocations and frees by kmem_cache users even if those use the array, > so their debugging or accounting accuracy should be unaffected. > > Signed-off-by: Vlastimil Babka > --- > include/linux/slab.h | 4 + > include/linux/slub_def.h | 12 ++ > mm/Kconfig | 1 + > mm/slub.c | 457 ++++++++++++++++++++++++++++++++++++++++++++++- > 4 files changed, 468 insertions(+), 6 deletions(-) > > diff --git a/include/linux/slab.h b/include/linux/slab.h > index d6d6ffeeb9a2..fe0c0981be59 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -197,6 +197,8 @@ struct kmem_cache *kmem_cache_create_usercopy(const char *name, > void kmem_cache_destroy(struct kmem_cache *s); > int kmem_cache_shrink(struct kmem_cache *s); > > +int kmem_cache_setup_percpu_array(struct kmem_cache *s, unsigned int count); > + > /* > * Please use this macro to create slab caches. Simply specify the > * name of the structure and maybe some flags that are listed above. > @@ -512,6 +514,8 @@ void kmem_cache_free(struct kmem_cache *s, void *objp); > void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p); > int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, void **p); > > +int kmem_cache_prefill_percpu_array(struct kmem_cache *s, unsigned int count, gfp_t gfp); > + > static __always_inline void kfree_bulk(size_t size, void **p) > { > kmem_cache_free_bulk(NULL, size, p); > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h > index deb90cf4bffb..2083aa849766 100644 > --- a/include/linux/slub_def.h > +++ b/include/linux/slub_def.h > @@ -13,8 +13,10 @@ > #include > > enum stat_item { > + ALLOC_PCA, /* Allocation from percpu array cache */ > ALLOC_FASTPATH, /* Allocation from cpu slab */ > ALLOC_SLOWPATH, /* Allocation by getting a new cpu slab */ > + FREE_PCA, /* Free to percpu array cache */ > FREE_FASTPATH, /* Free to cpu slab */ > FREE_SLOWPATH, /* Freeing not to cpu slab */ > FREE_FROZEN, /* Freeing to frozen slab */ > @@ -39,6 +41,8 @@ enum stat_item { > CPU_PARTIAL_FREE, /* Refill cpu partial on free */ > CPU_PARTIAL_NODE, /* Refill cpu partial from node partial */ > CPU_PARTIAL_DRAIN, /* Drain cpu partial to node partial */ > + PCA_REFILL, /* Refilling empty percpu array cache */ > + PCA_FLUSH, /* Flushing full percpu array cache */ > NR_SLUB_STAT_ITEMS > }; > > @@ -66,6 +70,13 @@ struct kmem_cache_cpu { > }; > #endif /* CONFIG_SLUB_TINY */ > > +struct slub_percpu_array { > + spinlock_t lock; > + unsigned int count; > + unsigned int used; > + void * objects[]; checkpatch complains: "foo * bar" should be "foo *bar"