Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp197109pxb; Fri, 15 Jan 2021 10:37:41 -0800 (PST) X-Google-Smtp-Source: ABdhPJyXPFUtVDVmJADH8D8BVqASsGeiKWjSz1PqAvO+uU6+fdXoMZL/CA/9jUKnGHnhT5VPGkx+ X-Received: by 2002:a17:906:d19b:: with SMTP id c27mr9990271ejz.234.1610735861718; Fri, 15 Jan 2021 10:37:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610735861; cv=none; d=google.com; s=arc-20160816; b=c/Ki69Lq9DbaX9ushkwd6RYcy/vNBsrBvEflMBTLUAmFrvsCed5QJd4V5N9Zzo0Km9 pWlzLjQeIY7NZ8/9A+9AGZLvZyPMTjyNgS9ilHx/d3GuCe5dTTMuF05NXopcKpH2+DTR MGtwPCtmA79wyvU4/pjYlmo/hCD/Dgu3bVfeV6+19XXIKVehk0/MmZp697DkksJLDo/n BJFamd7zzqjtRp9n5c5OKeQD9A6fd8+Rw8APKJA44gch84K4Ep28U4z62ndF4NV3CzTA jKkxLjjBn6gWvTV0RppRkgmloKcN9JGplIJ0fL0OeooyH4UpU6ShPsx8a67lyR0kTUZs Mi0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=EIOgrxgZcaSbnbKrhKT40dr4K9ee+WD1jUCRHQXB6Hc=; b=zJ3e/M2ezu4rG6luNZD0MlM0ThggZnZ7pgxV7FGpuSGEvj2g7EApStIku2RHCjLmM3 2+5G5kuC5RpXUZMFAsEOEg1C7yThD8vRmJ1xz5GXNGjPo/djCPXEh/CTn6eze9T8f3+L u9ZK4hAXEQuXqnL7V3mCoP6FWauI6qzP9aZxe7tOSKpGFqmNKmkSkFMKdK17RunCN4/q zdrl9n+8MIIdAzvLOwrLi4zOo0vb/HkvTtuHEhRKF/jMtw5pHBs9zxBFrz6hNntI+6aW dR6YPBKsAWf3B25OaR5bb6B8HKft3OZemABRKuBpcRh5Q4dATXqP/acTL5iaRZA5Zqvd 5CCQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id zh12si3478213ejb.416.2021.01.15.10.37.16; Fri, 15 Jan 2021 10:37:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726402AbhAOSgf (ORCPT + 99 others); Fri, 15 Jan 2021 13:36:35 -0500 Received: from mx2.suse.de ([195.135.220.15]:56104 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725946AbhAOSgf (ORCPT ); Fri, 15 Jan 2021 13:36:35 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id CE96EACAD; Fri, 15 Jan 2021 18:35:53 +0000 (UTC) From: Vlastimil Babka To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , Jann Horn , Vlastimil Babka Subject: [PATCH] mm, slub: splice cpu and page freelists in deactivate_slab() Date: Fri, 15 Jan 2021 19:35:43 +0100 Message-Id: <20210115183543.15097-1-vbabka@suse.cz> X-Mailer: git-send-email 2.29.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In deactivate_slab() we currently move all but one objects on the cpu freelist to the page freelist one by one using the costly cmpxchg_double() operation. Then we unfreeze the page while moving the last object on page freelist, with a final cmpxchg_double(). This can be optimized to avoid the cmpxchg_double() per object. Just count the objects on cpu freelist (to adjust page->inuse properly) and also remember the last object in the chain. Then splice page->freelist to the last object and effectively add the whole cpu freelist to page->freelist while unfreezing the page, with a single cmpxchg_double(). Signed-off-by: Vlastimil Babka --- Hi, I stumbled on the optimization while pondering over what to do with the percpu partial list memory wastage [1], but it should be useful on its own. I haven't run any measurements yet, but eliminating cmpxchg_double() operations should be obviously faster [TM]. Passed some basic testing, including hardened freelist and slub_debug. [1] https://lore.kernel.org/linux-mm/CAG48ez2Qx5K1Cab-m8BdSibp6wLTip6ro4=-umR7BLsEgjEYzA@mail.gmail.com/ mm/slub.c | 59 ++++++++++++++++++++++--------------------------------- 1 file changed, 24 insertions(+), 35 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 0d4bdf6783ee..c3141aa962be 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2167,9 +2167,9 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page, { enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE }; struct kmem_cache_node *n = get_node(s, page_to_nid(page)); - int lock = 0; + int lock = 0, free_delta = 0; enum slab_modes l = M_NONE, m = M_NONE; - void *nextfree; + void *nextfree, *freelist_iter, *freelist_tail; int tail = DEACTIVATE_TO_HEAD; struct page new; struct page old; @@ -2180,45 +2180,34 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page, } /* - * Stage one: Free all available per cpu objects back - * to the page freelist while it is still frozen. Leave the - * last one. - * - * There is no need to take the list->lock because the page - * is still frozen. + * Stage one: Count the objects on cpu's freelist as free_delta and + * remember the last object in freelist_tail for later splicing. */ - while (freelist && (nextfree = get_freepointer(s, freelist))) { - void *prior; - unsigned long counters; + freelist_tail = NULL; + freelist_iter = freelist; + while (freelist_iter) { + nextfree = get_freepointer(s, freelist_iter); /* * If 'nextfree' is invalid, it is possible that the object at - * 'freelist' is already corrupted. So isolate all objects - * starting at 'freelist'. + * 'freelist_iter' is already corrupted. So isolate all objects + * starting at 'freelist_iter' by skipping them. */ - if (freelist_corrupted(s, page, &freelist, nextfree)) + if (freelist_corrupted(s, page, &freelist_iter, nextfree)) break; - do { - prior = page->freelist; - counters = page->counters; - set_freepointer(s, freelist, prior); - new.counters = counters; - new.inuse--; - VM_BUG_ON(!new.frozen); + freelist_tail = freelist_iter; + free_delta++; - } while (!__cmpxchg_double_slab(s, page, - prior, counters, - freelist, new.counters, - "drain percpu freelist")); - - freelist = nextfree; + freelist_iter = nextfree; } /* - * Stage two: Ensure that the page is unfrozen while the - * list presence reflects the actual number of objects - * during unfreeze. + * Stage two: Unfreeze the page while splicing the per-cpu + * freelist to the head of page's freelist. + * + * Ensure that the page is unfrozen while the list presence + * reflects the actual number of objects during unfreeze. * * We setup the list membership and then perform a cmpxchg * with the count. If there is a mismatch then the page @@ -2231,15 +2220,15 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page, */ redo: - old.freelist = page->freelist; - old.counters = page->counters; + old.freelist = READ_ONCE(page->freelist); + old.counters = READ_ONCE(page->counters); VM_BUG_ON(!old.frozen); /* Determine target state of the slab */ new.counters = old.counters; - if (freelist) { - new.inuse--; - set_freepointer(s, freelist, old.freelist); + if (freelist_tail) { + new.inuse -= free_delta; + set_freepointer(s, freelist_tail, old.freelist); new.freelist = freelist; } else new.freelist = old.freelist; -- 2.29.2