Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp1550650rdb; Fri, 1 Dec 2023 23:49:07 -0800 (PST) X-Google-Smtp-Source: AGHT+IGA7Gt2QVxYguFJ1oJenwyc71+gwbUCr0k/047MzzXeGJjWO6rIy6XDSsWv8JMfiq5SR8IF X-Received: by 2002:a9d:6507:0:b0:6b8:7880:de9 with SMTP id i7-20020a9d6507000000b006b878800de9mr837113otl.19.1701503347710; Fri, 01 Dec 2023 23:49:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701503347; cv=none; d=google.com; s=arc-20160816; b=ma3C9n8RkH+JFd1ySRj6YqHeGoK+CzQMQJrbVPCa2TQpzQPwdC8pfvSB2ZN+qteIYq Z5MndrAY08k8IoDqmPYKJtYJ9Yv+EBtO106S6ZYZ7km+D/WxDOF1uzUGhltWForFzzMm /SAZldG1w3w2zTXvVwo8xNgtmEwfIl0WVU1azq3KCjUAhaDekjahZQbeiyPRNcw9Yxgb F+Nx4tHGLp8rAK5WB2l6MykVd6AKSSerq9duFyLofYLxK9pizy0cxSuKP4AEoPUjhE6m aEI9nm7McCMU/gnANoy5vMM+la6U5bvT4NWG5x7ze7atUTbI4hTYFDZfGlzNoy7wKY74 QVmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=8cvKzBKJavSflUIGtTrRq6X4172jeqPmi1cK5nCNC7Y=; fh=+oIbSkCTirVYp+h9oKZ8ocEKVMzVffnJOD/HiyJyvpQ=; b=es1I+bHergC3tszyAyGnQYdyHagSUjh/YEuzOaDqLN/hM52QI5x/nHD0jKMK7k/Mnc +SWCMu6DVhvZpC69bXUUKOmjDljgjpXac5hXWxwi0DFMgrvOOUusNWz+PLcJncbvbypQ B0kVgrykdcxno/vDHVNIYSQRJHodi1ZELdyJ6ezVYMKafC8yaumS/gChye0jAqR8DOnE LKucdqqj6WCnw8JAF8sOPHJ2vwPxfDO4/0Em6b0oExQe/Pn/HAaIyI9gEBMbi1LFRLey seUKlNP2DhMAhgPtVsInGiXDRP8Cd8C1tTHpZ7U4ieDPscaW8/g9KXxh6PTiqM11xku1 7J9g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=lGhXVd5+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id g9-20020a636b09000000b005bd27920754si4453803pgc.204.2023.12.01.23.49.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Dec 2023 23:49:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=lGhXVd5+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id B61CF807C7F7; Fri, 1 Dec 2023 23:49:04 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229489AbjLBHss (ORCPT + 99 others); Sat, 2 Dec 2023 02:48:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33150 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229379AbjLBHsr (ORCPT ); Sat, 2 Dec 2023 02:48:47 -0500 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 453CE116 for ; Fri, 1 Dec 2023 23:48:53 -0800 (PST) Received: by mail-pg1-x54a.google.com with SMTP id 41be03b00d2f7-5c5dd157f5cso1047606a12.0 for ; Fri, 01 Dec 2023 23:48:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701503332; x=1702108132; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8cvKzBKJavSflUIGtTrRq6X4172jeqPmi1cK5nCNC7Y=; b=lGhXVd5+RahfUqzurtlTdrVFm5U3IByq92y6SIjGEd1lZxPBIA0tcEAUSkESnXfxnI +C19ltlTdNoJCH7svAtp2/TdRVVB1b8KzLCGWr6b7s44p+a5s9cIhLL6uI+H9H8n65fq H77ZU5dD7kdkouAPIuFLAPoXGo30K9E6dRrXJ8gduvpRf4J6yOOaVdzVmccylX2yVl6R Qm9L8M/ds5NmI+TJBS5/2k9yfg8Rk0eF3dfN2SMXnzzWVIXmXRtXYz8mjm7+vWpe5JyJ Dl/6Ynj2NfI04T369q/FpRx3eTqNFAfJL8NZW9CPtKdzSB8fRm1LG7BVqXmTuuLZoIB4 993g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701503332; x=1702108132; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8cvKzBKJavSflUIGtTrRq6X4172jeqPmi1cK5nCNC7Y=; b=mi9JDcNGiLHeQDZvXB/8wSP792H36Z+pWjl3JOwbo45n1ilFc2H7hQDfbr7ZxVs8Cw tIPmhAbnlfHpJArl7wCCtU/KlZ7DR2NvVXX3dIBoKGhYItEcTWT9IM/iaH1U6rOaVzwQ QpUy+OrM0jh4mpV7rZ/KnmRyockvjqydDugwABEuBfrmBN16gnJGmSZsCyusxqQj/9vQ U/m/CGn5zjRdhscNRn/l72t0ge7EN6mDkssAy8EkmbE49CK6+wL7EJfPGj1aBDglJ6IT sJkBPFsWFpSMMc07l1xDKGHDjVCciJ2IOO6N8xwrt8QscO0Qni18O0xauhWuVsFiwe3/ TvHw== X-Gm-Message-State: AOJu0YwRI72CwZG1qzAyfaDXM7uUT0KQohjTEsTTI0wilIKPTAq/QiZU JAcOkWCbLvJxLHMSd9/Y+HyPmZzRTHH11A== X-Received: from shakeelb.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:262e]) (user=shakeelb job=sendgmr) by 2002:a63:a18:0:b0:5c2:2b59:5e6c with SMTP id 24-20020a630a18000000b005c22b595e6cmr4154075pgk.1.1701503332400; Fri, 01 Dec 2023 23:48:52 -0800 (PST) Date: Sat, 2 Dec 2023 07:48:50 +0000 In-Reply-To: <20231129032154.3710765-4-yosryahmed@google.com> Mime-Version: 1.0 References: <20231129032154.3710765-1-yosryahmed@google.com> <20231129032154.3710765-4-yosryahmed@google.com> Message-ID: <20231202074850.aisqdvyc5u2kth6r@google.com> Subject: Re: [mm-unstable v4 3/5] mm: memcg: make stats flushing threshold per-memcg From: Shakeel Butt To: Yosry Ahmed Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Ivan Babrou , Tejun Heo , "Michal =?utf-8?Q?Koutn=C3=BD?=" , Waiman Long , kernel-team@cloudflare.com, Wei Xu , Greg Thelen , Domenico Cerasuolo , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="us-ascii" X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Fri, 01 Dec 2023 23:49:04 -0800 (PST) On Wed, Nov 29, 2023 at 03:21:51AM +0000, Yosry Ahmed wrote: > A global counter for the magnitude of memcg stats update is maintained > on the memcg side to avoid invoking rstat flushes when the pending > updates are not significant. This avoids unnecessary flushes, which are > not very cheap even if there isn't a lot of stats to flush. It also > avoids unnecessary lock contention on the underlying global rstat lock. > > Make this threshold per-memcg. The scheme is followed where percpu (now > also per-memcg) counters are incremented in the update path, and only > propagated to per-memcg atomics when they exceed a certain threshold. > > This provides two benefits: > (a) On large machines with a lot of memcgs, the global threshold can be > reached relatively fast, so guarding the underlying lock becomes less > effective. Making the threshold per-memcg avoids this. > > (b) Having a global threshold makes it hard to do subtree flushes, as we > cannot reset the global counter except for a full flush. Per-memcg > counters removes this as a blocker from doing subtree flushes, which > helps avoid unnecessary work when the stats of a small subtree are > needed. > > Nothing is free, of course. This comes at a cost: > (a) A new per-cpu counter per memcg, consuming NR_CPUS * NR_MEMCGS * 4 > bytes. The extra memory usage is insigificant. > > (b) More work on the update side, although in the common case it will > only be percpu counter updates. The amount of work scales with the > number of ancestors (i.e. tree depth). This is not a new concept, adding > a cgroup to the rstat tree involves a parent loop, so is charging. > Testing results below show no significant regressions. > > (c) The error margin in the stats for the system as a whole increases > from NR_CPUS * MEMCG_CHARGE_BATCH to NR_CPUS * MEMCG_CHARGE_BATCH * > NR_MEMCGS. This is probably fine because we have a similar per-memcg > error in charges coming from percpu stocks, and we have a periodic > flusher that makes sure we always flush all the stats every 2s anyway. > > This patch was tested to make sure no significant regressions are > introduced on the update path as follows. The following benchmarks were > ran in a cgroup that is 2 levels deep (/sys/fs/cgroup/a/b/): > > (1) Running 22 instances of netperf on a 44 cpu machine with > hyperthreading disabled. All instances are run in a level 2 cgroup, as > well as netserver: > # netserver -6 > # netperf -6 -H ::1 -l 60 -t TCP_SENDFILE -- -m 10K > > Averaging 20 runs, the numbers are as follows: > Base: 40198.0 mbps > Patched: 38629.7 mbps (-3.9%) > > The regression is minimal, especially for 22 instances in the same > cgroup sharing all ancestors (so updating the same atomics). > > (2) will-it-scale page_fault tests. These tests (specifically > per_process_ops in page_fault3 test) detected a 25.9% regression before > for a change in the stats update path [1]. These are the > numbers from 10 runs (+ is good) on a machine with 256 cpus: > > LABEL | MEAN | MEDIAN | STDDEV | > ------------------------------+-------------+-------------+------------- > page_fault1_per_process_ops | | | | > (A) base | 270249.164 | 265437.000 | 13451.836 | > (B) patched | 261368.709 | 255725.000 | 13394.767 | > | -3.29% | -3.66% | | > page_fault1_per_thread_ops | | | | > (A) base | 242111.345 | 239737.000 | 10026.031 | > (B) patched | 237057.109 | 235305.000 | 9769.687 | > | -2.09% | -1.85% | | > page_fault1_scalability | | | > (A) base | 0.034387 | 0.035168 | 0.0018283 | > (B) patched | 0.033988 | 0.034573 | 0.0018056 | > | -1.16% | -1.69% | | > page_fault2_per_process_ops | | | > (A) base | 203561.836 | 203301.000 | 2550.764 | > (B) patched | 197195.945 | 197746.000 | 2264.263 | > | -3.13% | -2.73% | | > page_fault2_per_thread_ops | | | > (A) base | 171046.473 | 170776.000 | 1509.679 | > (B) patched | 166626.327 | 166406.000 | 768.753 | > | -2.58% | -2.56% | | > page_fault2_scalability | | | > (A) base | 0.054026 | 0.053821 | 0.00062121 | > (B) patched | 0.053329 | 0.05306 | 0.00048394 | > | -1.29% | -1.41% | | > page_fault3_per_process_ops | | | > (A) base | 1295807.782 | 1297550.000 | 5907.585 | > (B) patched | 1275579.873 | 1273359.000 | 8759.160 | > | -1.56% | -1.86% | | > page_fault3_per_thread_ops | | | > (A) base | 391234.164 | 390860.000 | 1760.720 | > (B) patched | 377231.273 | 376369.000 | 1874.971 | > | -3.58% | -3.71% | | > page_fault3_scalability | | | > (A) base | 0.60369 | 0.60072 | 0.0083029 | > (B) patched | 0.61733 | 0.61544 | 0.009855 | > | +2.26% | +2.45% | | > > All regressions seem to be minimal, and within the normal variance for > the benchmark. The fix for [1] assumes that 3% is noise -- and there > were no further practical complaints), so hopefully this means that such > variations in these microbenchmarks do not reflect on practical > workloads. > > (3) I also ran stress-ng in a nested cgroup and did not observe any > obvious regressions. > > [1]https://lore.kernel.org/all/20190520063534.GB19312@shao2-debian/ > > Suggested-by: Johannes Weiner > Signed-off-by: Yosry Ahmed > Tested-by: Domenico Cerasuolo Acked-by: Shakeel Butt