Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp2171006ybt; Fri, 3 Jul 2020 02:38:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzyoOhtIBtwU2WSEHFKO1kXpH76TUBClGEpMPOVfrqaNFMFX2AgAIdvLHPRsHyZng4SM7vs X-Received: by 2002:a17:906:9408:: with SMTP id q8mr30688922ejx.496.1593769135673; Fri, 03 Jul 2020 02:38:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1593769135; cv=none; d=google.com; s=arc-20160816; b=vfDA1uotkp1oVRfHfk8jmTszGIswZmDo9k9aFb/YwvLu5FMlMfMBVJ08rKVQ9H0hrf lBz4BpeyZzvmVGOjIwScihrFZwP1KGSO2ECZbeZ2Jd3zL+VEcc/EfeAA3jJAcG0CGvT1 Lp0wBH3q0F1VBESZo1guw1Mp+8PdMj2ARgjyAmWWZPD0zC9G68pSkzuGPWmsN7yBShpl QR6k68m4WXtzNAcMzO8BUgPphHBhqINI8kzbrqoL2lG5qJc+WiAB1cAlcXztorhgLpg5 D+TZj6rFC7e/x7t0bBT6fomVV//ZAslOD/n7JeDqRVfCszYmVhYyjh1+BlFB/bnRSnTb 1XhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject:reply-to; bh=6W2/FpwIoZPaMpTgIr21qNvHObIAYjHwqtc2pfFOYBw=; b=Ct/tp29ub2xpnBsYhAJR7tSRyo0bCztbCR7ljn3USRncLa9YvySIYKRuJCSNCygWRm z4sWJvUkpl95kbb2hTpHLpoO9L51utXx1C4dq2Bt5iFacGSxu+lxQE1fsvlAoRpcVLR4 BWNlRi7c3il9x+xDa63ocQ6L7pzIEYCSvLCe5wlecgqtZORSQYCjyQdn1YQkM8ag2e5z QiFcBhnNAVES4l6vCtrM426CidnK54ZxP7Ums5Tvpt90VZN72AeBjlhkPVR5UZ9C4TXH bOfoLmiGVP2jLuj+I2YdbaAjTekqiRV0e2+m3g3NsNuS/0HFm+vKMHj1Dm5zih3JBCIf UzvQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i21si8057459edb.402.2020.07.03.02.38.32; Fri, 03 Jul 2020 02:38:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725972AbgGCJiC (ORCPT + 99 others); Fri, 3 Jul 2020 05:38:02 -0400 Received: from out30-57.freemail.mail.aliyun.com ([115.124.30.57]:38927 "EHLO out30-57.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725786AbgGCJiC (ORCPT ); Fri, 3 Jul 2020 05:38:02 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R611e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04427;MF=xlpang@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0U1ZX3m._1593769072; Received: from xunleideMacBook-Pro.local(mailfrom:xlpang@linux.alibaba.com fp:SMTPD_---0U1ZX3m._1593769072) by smtp.aliyun-inc.com(127.0.0.1); Fri, 03 Jul 2020 17:37:53 +0800 Reply-To: xlpang@linux.alibaba.com Subject: Re: [PATCH 1/2] mm/slub: Introduce two counters for the partial objects To: Pekka Enberg Cc: Christoph Lameter , Andrew Morton , Wen Yang , Yang Shi , Roman Gushchin , "linux-mm@kvack.org" , LKML References: <1593678728-128358-1-git-send-email-xlpang@linux.alibaba.com> From: xunlei Message-ID: <7374a9fd-460b-1a51-1ab4-25170337e5f2@linux.alibaba.com> Date: Fri, 3 Jul 2020 17:37:52 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/7/2 PM 7:59, Pekka Enberg wrote: > On Thu, Jul 2, 2020 at 11:32 AM Xunlei Pang wrote: >> The node list_lock in count_partial() spend long time iterating >> in case of large amount of partial page lists, which can cause >> thunder herd effect to the list_lock contention, e.g. it cause >> business response-time jitters when accessing "/proc/slabinfo" >> in our production environments. > > Would you have any numbers to share to quantify this jitter? I have no We have HSF RT(High-speed Service Framework Response-Time) monitors, the RT figures fluctuated randomly, then we deployed a tool detecting "irq off" and "preempt off" to dump the culprit's calltrace, capturing the list_lock cost up to 100ms with irq off issued by "ss", this also caused network timeouts. > objections to this approach, but I think the original design > deliberately made reading "/proc/slabinfo" more expensive to avoid > atomic operations in the allocation/deallocation paths. It would be > good to understand what is the gain of this approach before we switch > to it. Maybe even run some slab-related benchmark (not sure if there's > something better than hackbench these days) to see if the overhead of > this approach shows up. I thought that before, but most atomic operations are serialized by the list_lock. Another possible way is to hold list_lock in __slab_free(), then these two counters can be changed from atomic to long. I also have no idea what's the standard SLUB benchmark for the regression test, any specific suggestion? > >> This patch introduces two counters to maintain the actual number >> of partial objects dynamically instead of iterating the partial >> page lists with list_lock held. >> >> New counters of kmem_cache_node are: pfree_objects, ptotal_objects. >> The main operations are under list_lock in slow path, its performance >> impact is minimal. >> >> Co-developed-by: Wen Yang >> Signed-off-by: Xunlei Pang >> --- >> mm/slab.h | 2 ++ >> mm/slub.c | 38 +++++++++++++++++++++++++++++++++++++- >> 2 files changed, 39 insertions(+), 1 deletion(-) >> >> diff --git a/mm/slab.h b/mm/slab.h >> index 7e94700..5935749 100644 >> --- a/mm/slab.h >> +++ b/mm/slab.h >> @@ -616,6 +616,8 @@ struct kmem_cache_node { >> #ifdef CONFIG_SLUB >> unsigned long nr_partial; >> struct list_head partial; >> + atomic_long_t pfree_objects; /* partial free objects */ >> + atomic_long_t ptotal_objects; /* partial total objects */ > > You could rename these to "nr_partial_free_objs" and > "nr_partial_total_objs" for readability. Sounds good. Thanks! > > - Pekka >