Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp3622341pxf; Mon, 15 Mar 2021 14:06:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyNnItgmYLlN17nPGO3zrB3Qq728Zc8Z0LzvgUMNSTMwxpZw1EinnSc2azgdHF0IG35LSaB X-Received: by 2002:a17:907:37a:: with SMTP id rs26mr25849137ejb.336.1615842414907; Mon, 15 Mar 2021 14:06:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1615842414; cv=none; d=google.com; s=arc-20160816; b=GWt2ek2o8v3/13c/Umun54s+MJp1Mg1OrA1uBfCLrB8df7Ez2tP+4wOtkJJ0kCuEmj A7POSELwW5OMiPbfN3rML6f5elRIlflY9kgBwuop6IxCRVRNAI78CKP5tL5RFhxMIvK0 A6crMALxxZO3SvQBaN7bDljGokF/antVNm+ez2UjM66QgVX8tQqrDwK7Zurd75QSaOkr eeAhZTsSNzvapFTTU4+7RSbi6kWUm7ph9Ysnn8h7ySICNjqWYzTxKQcXUMv6AdinJycL YOnmWimsGIxzdqnYLdLNiNANT4Z69e0zkixQZ+AGUNEsaUwuNVOMFIJ0c99x9B1lBlzH Zfyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:subject:from :references:cc:to; bh=FL/kj2vCDKvICzFlC1a0c22qj/tj+H3/V9LBPAun6W4=; b=U4l4kMc/Ljy2Ps36AaX0TfbhX/LxGNYnIZ9dXwCGkzcBnFbouHz8D+8VULXCZASXUt C/m688CaUEwHtMQoMv3/aEf1gMP9fiKmWac8kf+51GRFrQC15+BjxwG9R55E1SfGkrKp vEOiNn+kbdQtedhQPO7QWDAgX+TWm2KTlR1VBvceF5mSHhNb3GZhVglqRzHL/g6guErS Z+xgNKefBYo4cb3rG3jiI6ZKD80mz250JQ58QxQrbpWRpt76Ia6nz2v5PEX7dvpAhPvV h9k6WHeNzye3bKug725IHuzM0YErsSwh+K7sK1ZhlmzQ8sM+avUasMVeitT3xHPZVhN0 8EGQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b6si12788966eju.321.2021.03.15.14.06.32; Mon, 15 Mar 2021 14:06:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232822AbhCOSuR (ORCPT + 99 others); Mon, 15 Mar 2021 14:50:17 -0400 Received: from mx2.suse.de ([195.135.220.15]:45802 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232256AbhCOSt7 (ORCPT ); Mon, 15 Mar 2021 14:49:59 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 7FD47AE8F; Mon, 15 Mar 2021 18:49:58 +0000 (UTC) To: Xunlei Pang , Christoph Lameter , Pekka Enberg , Roman Gushchin , Konstantin Khlebnikov , David Rientjes , Matthew Wilcox , Shu Ming , Andrew Morton Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Wen Yang , James Wang , Thomas Gleixner References: <1615303512-35058-1-git-send-email-xlpang@linux.alibaba.com> From: Vlastimil Babka Subject: Re: [PATCH v3 0/4] mm/slub: Fix count_partial() problem Message-ID: <793c884a-9d60-baaf-fab8-3e5f4a024124@suse.cz> Date: Mon, 15 Mar 2021 19:49:57 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: <1615303512-35058-1-git-send-email-xlpang@linux.alibaba.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/9/21 4:25 PM, Xunlei Pang wrote: > count_partial() can hold n->list_lock spinlock for quite long, which > makes much trouble to the system. This series eliminate this problem. Before I check the details, I have two high-level comments: - patch 1 introduces some counting scheme that patch 4 then changes, could we do this in one step to avoid the churn? - the series addresses the concern that spinlock is being held, but doesn't address the fact that counting partial per-node slabs is not nearly enough if we want accurate in /proc/slabinfo because there are also percpu slabs and per-cpu partial slabs, where we don't track the free objects at all. So after this series while the readers of /proc/slabinfo won't block the spinlock, they will get the same garbage data as before. So Christoph is not wrong to say that we can just report active_objs == num_objs and it won't actually break any ABI. At the same time somebody might actually want accurate object statistics at the expense of peak performance, and it would be nice to give them such option in SLUB. Right now we don't provide this accuracy even with CONFIG_SLUB_STATS, although that option provides many additional tuning stats, with additional overhead. So my proposal would be a new config for "accurate active objects" (or just tie it to CONFIG_SLUB_DEBUG?) that would extend the approach of percpu counters in patch 4 to all alloc/free, so that it includes percpu slabs. Without this config enabled, let's just report active_objs == num_objs. Vlastimil > v1->v2: > - Improved changelog and variable naming for PATCH 1~2. > - PATCH3 adds per-cpu counter to avoid performance regression > in concurrent __slab_free(). > > v2->v3: > - Changed "page->inuse" to the safe "new.inuse", etc. > - Used CONFIG_SLUB_DEBUG and CONFIG_SYSFS condition for new counters. > - atomic_long_t -> unsigned long > > [Testing] > There seems might be a little performance impact under extreme > __slab_free() concurrent calls according to my tests. > > On my 32-cpu 2-socket physical machine: > Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz > > 1) perf stat --null --repeat 10 -- hackbench 20 thread 20000 > > == original, no patched > Performance counter stats for 'hackbench 20 thread 20000' (10 runs): > > 24.536050899 seconds time elapsed ( +- 0.24% ) > > > Performance counter stats for 'hackbench 20 thread 20000' (10 runs): > > 24.588049142 seconds time elapsed ( +- 0.35% ) > > > == patched with patch1~4 > Performance counter stats for 'hackbench 20 thread 20000' (10 runs): > > 24.670892273 seconds time elapsed ( +- 0.29% ) > > > Performance counter stats for 'hackbench 20 thread 20000' (10 runs): > > 24.746755689 seconds time elapsed ( +- 0.21% ) > > > 2) perf stat --null --repeat 10 -- hackbench 32 thread 20000 > > == original, no patched > Performance counter stats for 'hackbench 32 thread 20000' (10 runs): > > 39.784911855 seconds time elapsed ( +- 0.14% ) > > Performance counter stats for 'hackbench 32 thread 20000' (10 runs): > > 39.868687608 seconds time elapsed ( +- 0.19% ) > > == patched with patch1~4 > Performance counter stats for 'hackbench 32 thread 20000' (10 runs): > > 39.681273015 seconds time elapsed ( +- 0.21% ) > > Performance counter stats for 'hackbench 32 thread 20000' (10 runs): > > 39.681238459 seconds time elapsed ( +- 0.09% ) > > > Xunlei Pang (4): > mm/slub: Introduce two counters for partial objects > mm/slub: Get rid of count_partial() > percpu: Export per_cpu_sum() > mm/slub: Use percpu partial free counter > > include/linux/percpu-defs.h | 10 ++++ > kernel/locking/percpu-rwsem.c | 10 ---- > mm/slab.h | 4 ++ > mm/slub.c | 120 +++++++++++++++++++++++++++++------------- > 4 files changed, 97 insertions(+), 47 deletions(-) >