Received: by 2002:a25:23cc:0:0:0:0:0 with SMTP id j195csp359540ybj; Mon, 4 May 2020 23:24:25 -0700 (PDT) X-Google-Smtp-Source: APiQypKiQzEBQuewriuuDOcbMbzvgLeywP3Qg5Uy4KTSv2TEAziJyw+RNGIwVx5lpdN+qzD1FNcV X-Received: by 2002:aa7:c492:: with SMTP id m18mr1197262edq.346.1588659865357; Mon, 04 May 2020 23:24:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588659865; cv=none; d=google.com; s=arc-20160816; b=UT2MypUe1p1Hxk25aVpfYi5OkqpF74AilWZIInB7O7xEB+OTrrdZ3iuOyOIPriU6UV aplCMtHxYyAr+Hj3y+CNiGm/e7cWol2lEkaTX1V081y4859rzjmODS5KYmtkEVzk2iwq keV6vfSqLtHsb4s71nn5egpQIX8WFnvFYC4HsoZDdDdw02IAi1wIpT1JxFywYHjZBHfc cfz/fqfqIfx98koNqigoPQN8nMuzAOKv8eDMbPZFMq7KYcIFs8BLFCpTvVtNSxovLXAU kBzp/Mq925EEd7yhpKp0UTrbJYYvCgMe0INXTTfGWyv+6SNWzEVohzMmx/YwUAovqSL3 UqaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=UHG2ZtUAk5L53n9yzMVjWb3inH3CddC60PlPMECd9bY=; b=1IIz4GITx8KKT9TV0+qfXglGvFofwFybzPWUVtd14/Se2OVuBtZtNlFzAWwSYDIZRE S8OCVR8NtbkWw+MqfV09SbEO0jTjcUVqIUu325Li91lCeOCA7A+kEZhtBh70+Gaw0xK2 RahEDAKwHJr9J26UvWqioTbfaSvXflyZxk6UoNdDOuRs2keKQAguURyjgTlHJpQhi1qW /PLwVEOpxrjH3nQe5uMemV1ZL0HgfOzdgpnfDW0Tznw/EdhC77WfK3QtwRYyVvS2N8E5 wxBIG/iKFt4JcEKq5z3nftNpUS0cVPEGLPyim5qf8Ia56buOuwpxgGVChxdoT+OlFJsw gjzQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@yandex-team.ru header.s=default header.b=QpdrQKhf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=yandex-team.ru Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f3si685163edn.165.2020.05.04.23.24.02; Mon, 04 May 2020 23:24:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@yandex-team.ru header.s=default header.b=QpdrQKhf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=yandex-team.ru Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726345AbgEEGUb (ORCPT + 99 others); Tue, 5 May 2020 02:20:31 -0400 Received: from forwardcorp1o.mail.yandex.net ([95.108.205.193]:52770 "EHLO forwardcorp1o.mail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725320AbgEEGUb (ORCPT ); Tue, 5 May 2020 02:20:31 -0400 Received: from mxbackcorp1o.mail.yandex.net (mxbackcorp1o.mail.yandex.net [IPv6:2a02:6b8:0:1a2d::301]) by forwardcorp1o.mail.yandex.net (Yandex) with ESMTP id A6D5B2E146B; Tue, 5 May 2020 09:20:24 +0300 (MSK) Received: from myt4-18a966dbd9be.qloud-c.yandex.net (myt4-18a966dbd9be.qloud-c.yandex.net [2a02:6b8:c00:12ad:0:640:18a9:66db]) by mxbackcorp1o.mail.yandex.net (mxbackcorp/Yandex) with ESMTP id jRyCxRebX9-KNbW608l; Tue, 05 May 2020 09:20:24 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1588659624; bh=UHG2ZtUAk5L53n9yzMVjWb3inH3CddC60PlPMECd9bY=; h=In-Reply-To:Message-ID:From:Date:References:To:Subject:Cc; b=QpdrQKhf49jn0PcjbsTwKeUhLXiisTH2LZLPEkz3WN3WInWIRIwj1Cz/IMcP/vI6g PWFO5HplqMQ7b2KkI1+bvL8iheJOoBWAggN0GEAcpAq3zLtHeBaL7rwapC0Lzu3KMc GpnHCUtXbmow9G/vBQp2ahpOPZYKxPmJsWCFcvAc= Authentication-Results: mxbackcorp1o.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Received: from dynamic-vpn.dhcp.yndx.net (dynamic-vpn.dhcp.yndx.net [2a02:6b8:b080:7201::1:2]) by myt4-18a966dbd9be.qloud-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id h9MkKKOG5c-KMWiAFd5; Tue, 05 May 2020 09:20:23 +0300 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client certificate not present) Subject: Re: [PATCH] slub: limit count of partial slabs scanned to gather statistics To: David Rientjes Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Christoph Lameter , Pekka Enberg , Joonsoo Kim References: <158860845968.33385.4165926113074799048.stgit@buzz> From: Konstantin Khlebnikov Message-ID: <57ff4c05-bf97-b546-44a3-163cfbe7cad4@yandex-team.ru> Date: Tue, 5 May 2020 09:20:22 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-CA Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/05/2020 00.19, David Rientjes wrote: > On Mon, 4 May 2020, Konstantin Khlebnikov wrote: > >> To get exact count of free and used objects slub have to scan list of >> partial slabs. This may take at long time. Scanning holds spinlock and >> blocks allocations which move partial slabs to per-cpu lists and back. >> >> Example found in the wild: >> >> # cat /sys/kernel/slab/dentry/partial >> 14478538 N0=7329569 N1=7148969 >> # time cat /sys/kernel/slab/dentry/objects >> 286225471 N0=136967768 N1=149257703 >> >> real 0m1.722s >> user 0m0.001s >> sys 0m1.721s >> >> The same problem in slab was addressed in commit f728b0a5d72a ("mm, slab: >> faster active and free stats") by adding more kmem cache statistics. >> For slub same approach requires atomic op on fast path when object frees. >> >> Let's simply limit count of scanned slabs and print warning. >> Limit set in /sys/module/slub/parameters/max_partial_to_count. >> Default is 10000 which should be enough for most sane cases. >> >> Return linear approximation if list of partials is longer than limit. >> Nobody should notice difference. >> > > Hi Konstantin, > > Do you only exhibit this on slub for SO_ALL|SO_OBJECTS? I notice the > timing in the changelog is only looking at "objects" and not "partial". "partial" is a count of partial slabs which simply sums per-numa counters. Affected only "objects" and "objects_partial" which walk the list. > > If so, it seems this is also a problem for get_slabinfo() since it also > uses the count_free() callback for count_partial(). Yep, /proc/slabinfo also affected. Actually it's more affected than sysfs - it walks all cgroups while sysfs shows only root. > > Concern would be that the kernel has now drastically changed a statistic > that it exports to userspace. There was some discussion about this back > in 2016[*] and one idea was that slabinfo would truncate its scanning and > append a '+' to the end of the value to indicate it exceeds the max, i.e. > 10000+. I think that '+' actually caused the problem itself for userspace > processes. Yep, "+" will break everything for sure. I thought about returning "-1" or INT_MAX without counting, but approximation gives almost correct result without breaking anything. Each partial slab has at least one used and free object thus approximated result will be somewhere between nr_partial_slabs and nr_partial_objects. > > I think the patch is too far reaching, however, since it impacts all > count_partial() counting and not only for the case cited in the changelog. > Are there examples for things other than the count_free() callback? Nope, this is just a statistics for used/free objects. Total count of objects and slabs are counted precisely. > > [*] https://lore.kernel.org/patchwork/patch/708427/ > >> Signed-off-by: Konstantin Khlebnikov >> --- >> mm/slub.c | 15 ++++++++++++++- >> 1 file changed, 14 insertions(+), 1 deletion(-) >> >> diff --git a/mm/slub.c b/mm/slub.c >> index 9bf44955c4f1..86a366f7acb6 100644 >> --- a/mm/slub.c >> +++ b/mm/slub.c >> @@ -2407,16 +2407,29 @@ static inline unsigned long node_nr_objs(struct kmem_cache_node *n) >> #endif /* CONFIG_SLUB_DEBUG */ >> >> #if defined(CONFIG_SLUB_DEBUG) || defined(CONFIG_SYSFS) >> + >> +static unsigned long max_partial_to_count __read_mostly = 10000; >> +module_param(max_partial_to_count, ulong, 0644); >> + >> static unsigned long count_partial(struct kmem_cache_node *n, >> int (*get_count)(struct page *)) >> { >> + unsigned long counted = 0; >> unsigned long flags; >> unsigned long x = 0; >> struct page *page; >> >> spin_lock_irqsave(&n->list_lock, flags); >> - list_for_each_entry(page, &n->partial, slab_list) >> + list_for_each_entry(page, &n->partial, slab_list) { >> x += get_count(page); >> + >> + if (++counted > max_partial_to_count) { >> + pr_warn_once("SLUB: too much partial slabs to count all objects, increase max_partial_to_count.\n"); >> + /* Approximate total count of objects */ >> + x = mult_frac(x, n->nr_partial, counted); >> + break; >> + } >> + } >> spin_unlock_irqrestore(&n->list_lock, flags); >> return x; >> } >> >>