Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp4237329pxf; Tue, 16 Mar 2021 08:40:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyqUq3uHgzzNotuMOmeKLKfKdZ2aBS8TnRxfXRtHx8pCXxBLIZVqeEHSW6QyvpdgxPWY4qO X-Received: by 2002:a17:906:2504:: with SMTP id i4mr30952908ejb.115.1615909253797; Tue, 16 Mar 2021 08:40:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1615909253; cv=none; d=google.com; s=arc-20160816; b=KUzePycnlCvjJuBNgRuKRcbnJ0LGwqiQwPo9vawvDr9DZJ68zlHvVQNdWSezRCzV13 saNREKZ3fmzmbL2Sec93NB4ru2HDLtwVXXMSEADLpGES2Va5nc7PqadyaPHqCzxfh271 TIWm4p3SeTI69Ou84/KxDzaWmhU2dKhkIrLdHm2l61MIKE0yRglYQKu/Xuj4mEpn6gLL 8rz2N2jFqcLbaiKxAor4pVaMoli/cwPSV590SZ0/BMB3j837UKS/rPhKCiYzcy2F/MAL SvVOpXz+azwrq9g4FWjD/H4aUkkKcQvrL8WDGA0YLDVrYuAJr2GeqCF3tLC0oO3cbohI lcgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject:reply-to; bh=d8ifi9bkTcLReaJ95oL/sC3L2wIy5v951Bd1JlYLSoI=; b=iJ8Ayy2q11Xad+3eulJTtV///KcM8wKIMeMhNMt8KBRDUTHvso4P6eG6V+rTdqIvDm HCoyIunbs2n604RDqBYiqjay+al7AOs4/AWuQfofdmbVY2JJ64qrbyZJunN1wSQp0WSI vfLOItKDOFF9LWp1zWxYEb1/OffY/SJwrMQipZosgHrPDmHy52omPkpUdIGwrAwKOxOh 6v2GMUnLBXZwY0P0cks62nAvf/f+cohNKwk/2/vHDOjXncnMhLWRvq4eXzmceUGzFQF/ UpR/YeeirTMiMuJNXc1sHrjfhdu+X/0jU0ZJ47qB0FF5Sj8fwvGrnAqf/qoAZGXUmP11 A30w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b59si12254118edf.40.2021.03.16.08.40.31; Tue, 16 Mar 2021 08:40:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237497AbhCPLtz (ORCPT + 99 others); Tue, 16 Mar 2021 07:49:55 -0400 Received: from out30-43.freemail.mail.aliyun.com ([115.124.30.43]:46489 "EHLO out30-43.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230158AbhCPLti (ORCPT ); Tue, 16 Mar 2021 07:49:38 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R301e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04395;MF=xlpang@linux.alibaba.com;NM=1;PH=DS;RN=16;SR=0;TI=SMTPD_---0US9MP9P_1615895373; Received: from xunleideMacBook-Pro.local(mailfrom:xlpang@linux.alibaba.com fp:SMTPD_---0US9MP9P_1615895373) by smtp.aliyun-inc.com(127.0.0.1); Tue, 16 Mar 2021 19:49:34 +0800 Reply-To: xlpang@linux.alibaba.com Subject: Re: [PATCH v3 0/4] mm/slub: Fix count_partial() problem To: Vlastimil Babka , xlpang@linux.alibaba.com, Christoph Lameter , Pekka Enberg , Roman Gushchin , Konstantin Khlebnikov , David Rientjes , Matthew Wilcox , Shu Ming , Andrew Morton , Christoph Lameter Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Wen Yang , James Wang , Thomas Gleixner References: <1615303512-35058-1-git-send-email-xlpang@linux.alibaba.com> <793c884a-9d60-baaf-fab8-3e5f4a024124@suse.cz> <1b4f7296-cd26-7177-873b-a35f5504ccfb@linux.alibaba.com> From: Xunlei Pang Message-ID: <9ea6829a-bf10-4c24-bc8c-492862a76b54@linux.alibaba.com> Date: Tue, 16 Mar 2021 19:49:33 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/16/21 7:02 PM, Vlastimil Babka wrote: > On 3/16/21 11:42 AM, Xunlei Pang wrote: >> On 3/16/21 2:49 AM, Vlastimil Babka wrote: >>> On 3/9/21 4:25 PM, Xunlei Pang wrote: >>>> count_partial() can hold n->list_lock spinlock for quite long, which >>>> makes much trouble to the system. This series eliminate this problem. >>> >>> Before I check the details, I have two high-level comments: >>> >>> - patch 1 introduces some counting scheme that patch 4 then changes, could we do >>> this in one step to avoid the churn? >>> >>> - the series addresses the concern that spinlock is being held, but doesn't >>> address the fact that counting partial per-node slabs is not nearly enough if we >>> want accurate in /proc/slabinfo because there are also percpu >>> slabs and per-cpu partial slabs, where we don't track the free objects at all. >>> So after this series while the readers of /proc/slabinfo won't block the >>> spinlock, they will get the same garbage data as before. So Christoph is not >>> wrong to say that we can just report active_objs == num_objs and it won't >>> actually break any ABI. >> >> If maintainers don't mind this inaccuracy which I also doubt its >> importance, then it becomes easy. For fear that some people who really >> cares, introducing an extra config(default-off) for it would be a good >> option. > > Great. > >>> At the same time somebody might actually want accurate object statistics at the >>> expense of peak performance, and it would be nice to give them such option in >>> SLUB. Right now we don't provide this accuracy even with CONFIG_SLUB_STATS, >>> although that option provides many additional tuning stats, with additional >>> overhead. >>> So my proposal would be a new config for "accurate active objects" (or just tie >>> it to CONFIG_SLUB_DEBUG?) that would extend the approach of percpu counters in >>> patch 4 to all alloc/free, so that it includes percpu slabs. Without this config >>> enabled, let's just report active_objs == num_objs. >> For percpu slabs, the numbers can be retrieved from the existing >> slub_percpu_partial()->pobjects, looks no need extra work. > > Hm, unfortunately it's not that simple, the number there is a snapshot that can > become wildly inacurate afterwards. > It's hard to make it absoultely accurate using percpu, the data can change during you iterating all the cpus and total_objects, I can't imagine its real-world usage, not to mention the percpu freelist cache. I think sysfs slabs_cpu_partial should work enough for common debug purpose.