Received: by 2002:a05:6358:701b:b0:131:369:b2a3 with SMTP id 27csp4571844rwo; Tue, 25 Jul 2023 08:00:52 -0700 (PDT) X-Google-Smtp-Source: APBJJlFrHWqYGCPZK5o0MAPSRvA09B0K50zNRLLCj8enW449Jg6daq5OZu9Avy80b9oFWWUGuH1I X-Received: by 2002:a05:6358:8810:b0:132:db25:bbfc with SMTP id hv16-20020a056358881000b00132db25bbfcmr12950527rwb.2.1690297251897; Tue, 25 Jul 2023 08:00:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690297251; cv=none; d=google.com; s=arc-20160816; b=E6HWiylrZHAxUxXv4pzQDfVx+TckkZVrc/XSVje1XgaVLgaNjQbT9ySXvG+4Vj8qnr wllAyIGjkIiSgc+Sp+lWpCdVfGR9MKsB2PTJFFuqwKS/D9wL3e509DQuQTnpCDdoMmJM fWdfLdTJMHdxOKPSeOTNZzD6cMAomMFmUAGt2ikS3mic3Vg5cX/+wuHPB0M39tNp/Akf jgxc7FIFkwYtOmGqvj2QWhxYcUnmgPaXt/3jGwWYnhGbB0BEI5BJkXFy3tvu5D+y+3j4 iOW2amO9G3vwpLTcVd8xoLyBcSKK8omnNqGQnXzy1ZfVUeQa0hiZvYume453Bpv59aJC GWQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=L6q2fbSEUaRmabz4YeOC1QVhb19awtdwyXc9KAJHQSg=; fh=TN8/E2QF8d8XaxcFfOvBnFe853L7WWmRWsEu3wY/AyA=; b=t9Dvf6o79uOudiSlxCCWkfOnSDnMTtXoKi8taygXrc6FwvUwS/WottJ/6eg7ulwwuT pBhMeNmlYYWP/MNgqLPcpbwa+FZy9ub7UMC8BEZWGdu3pcGgCDDUjl6HKk+fNST6HVir 6FJhc1c1+0Nu9lumNUmYn7E9g3Yq38DGlMkdkwXNmVdwkfeuzgIkfoVKPfjjn+ci0FSt qtY1zONHlpuxt1Eb1pBJLjk55tySoCDDZG9gKuEIpDLmuHjzaAlfe1Ccf5/grYZHOnfX 4TeV5m/MsPBz0k/WvAsa30DRhb7AgyZUsbbBaxLLTrNyHOBO9UKRo+aT3QjD7DWQzWO9 tX3A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=NhRu+ZEB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w64-20020a638243000000b0055c85a94f11si11673518pgd.276.2023.07.25.08.00.37; Tue, 25 Jul 2023 08:00:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=NhRu+ZEB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232186AbjGYOEn (ORCPT + 99 others); Tue, 25 Jul 2023 10:04:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53778 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230104AbjGYOEm (ORCPT ); Tue, 25 Jul 2023 10:04:42 -0400 Received: from mail-qv1-xf2a.google.com (mail-qv1-xf2a.google.com [IPv6:2607:f8b0:4864:20::f2a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 372F91FC9 for ; Tue, 25 Jul 2023 07:04:37 -0700 (PDT) Received: by mail-qv1-xf2a.google.com with SMTP id 6a1803df08f44-63d09d886a6so10669316d6.2 for ; Tue, 25 Jul 2023 07:04:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1690293876; x=1690898676; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=L6q2fbSEUaRmabz4YeOC1QVhb19awtdwyXc9KAJHQSg=; b=NhRu+ZEBbGmHUyV23DWyyUEl6Nq9VUT++p+9QWJdkAJ01svswkr1clQ+XgDEJoKZlu Weht+NskAhuvf8tGjjG2GgCJMoox5KBq9h7cac9rQqrC1vpsp28oMsNe9KY4CjN98NgM 2onTXCyaJs2t96h7XB5cCypHFdWg+fwb65PFB8O3ZrvuiaJHuzW2QZDMOqGry5xTxZnL a0LtNC5vIK9Xsexs+3OgGgt5bslUleSsJtnl1gA4F45g2Ko2FIGEpl62me3wUqr1jORf 2aM4RpjbPng4AlA4q3lBQwcuniuBSVYARmbS0pCKD+STkQGtoiTn6CFXFLVdidVlar95 wCyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690293876; x=1690898676; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=L6q2fbSEUaRmabz4YeOC1QVhb19awtdwyXc9KAJHQSg=; b=AjOcM7JsRX2EEeEhIs5C9VGYzYiHl0l1MRMqjz4hsnSZxJR/4jF2MM94rM4a4sclc/ jDqinGsf6/tMkGMcJT2iD0tVaVustojDmqXbXr5XH1jIrMJJ+83UMjmdGmYrBus0Bgun PlFGzYi3mhEMRs+htPX2nbnxnvD63bw14Egj+erdmdiKMTN0eNxVPbspLThnoTJ440i/ WuDleei2wZc3xdZfVH4asinRQr9mPmv0b/jxTWd0LMAUH9FoRsAE+VMUvFAGe+B4hw0x Db5q/nboRbY7VqckZwUA2M4b24airnlSr5J4qghQZhUpRTsMouwwT5N2wnoWtbRMIwdX 1m9g== X-Gm-Message-State: ABy/qLZFe9ie1PgG9OC7TeGQ+pSWmxkEl2vjFOPd+ZMTzJkz2dOVNMFt Fv0BZS06ZtXjdF2NSmi93dmCbQ== X-Received: by 2002:a0c:9a4f:0:b0:628:8185:bd6e with SMTP id q15-20020a0c9a4f000000b006288185bd6emr2524662qvd.5.1690293876314; Tue, 25 Jul 2023 07:04:36 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:ad06]) by smtp.gmail.com with ESMTPSA id q4-20020a0cf5c4000000b00636047c276csm1368975qvm.126.2023.07.25.07.04.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Jul 2023 07:04:36 -0700 (PDT) Date: Tue, 25 Jul 2023 10:04:35 -0400 From: Johannes Weiner To: Yosry Ahmed Cc: Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] mm: memcg: use rstat for non-hierarchical stats Message-ID: <20230725140435.GB1146582@cmpxchg.org> References: <20230719174613.3062124-1-yosryahmed@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230719174613.3062124-1-yosryahmed@google.com> X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 19, 2023 at 05:46:13PM +0000, Yosry Ahmed wrote: > Currently, memcg uses rstat to maintain hierarchical stats. The rstat > framework keeps track of which cgroups have updates on which cpus. > > For non-hierarchical stats, as memcg moved to rstat, they are no longer > readily available as counters. Instead, the percpu counters for a given > stat need to be summed to get the non-hierarchical stat value. This > causes a performance regression when reading non-hierarchical stats on > kernels where memcg moved to using rstat. This is especially visible > when reading memory.stat on cgroup v1. There are also some code paths > internal to the kernel that read such non-hierarchical stats. It's actually not an rstat regression. It's always been this costly. Quick history: We used to maintain *all* stats in per-cpu counters at the local level. memory.stat reads would have to iterate and aggregate the entire subtree every time. This was obviously very costly, so we added batched upward propagation during stat updates to simplify reads: commit 42a300353577ccc17ecc627b8570a89fa1678bec Author: Johannes Weiner Date: Tue May 14 15:47:12 2019 -0700 mm: memcontrol: fix recursive statistics correctness & scalabilty However, that caused a regression in the stat write path, as the upward propagation would bottleneck on the cachelines in the shared parents. The fix for *that* re-introduced the per-cpu loops in the local stat reads: commit 815744d75152078cde5391fc1e3c2d4424323fb6 Author: Johannes Weiner Date: Thu Jun 13 15:55:46 2019 -0700 mm: memcontrol: don't batch updates of local VM stats and events So I wouldn't say it's a regression from rstat. Except for that short period between the two commits above, the read side for local stats was always expensive. rstat promises a shot at finally fixing it, with less risk to the write path. > It is inefficient to iterate and sum counters in all cpus when the rstat > framework knows exactly when a percpu counter has an update. Instead, > maintain cpu-aggregated non-hierarchical counters for each stat. During > an rstat flush, keep those updated as well. When reading > non-hierarchical stats, we no longer need to iterate cpus, we just need > to read the maintainer counters, similar to hierarchical stats. > > A caveat is that we now a stats flush before reading > local/non-hierarchical stats through {memcg/lruvec}_page_state_local() > or memcg_events_local(), where we previously only needed a flush to > read hierarchical stats. Most contexts reading non-hierarchical stats > are already doing a flush, add a flush to the only missing context in > count_shadow_nodes(). > > With this patch, reading memory.stat from 1000 memcgs is 3x faster on a > machine with 256 cpus on cgroup v1: > # for i in $(seq 1000); do mkdir /sys/fs/cgroup/memory/cg$i; done > # time cat /dev/cgroup/memory/cg*/memory.stat > /dev/null > real 0m0.125s > user 0m0.005s > sys 0m0.120s > > After: > real 0m0.032s > user 0m0.005s > sys 0m0.027s > > Signed-off-by: Yosry Ahmed Acked-by: Johannes Weiner But I want to be clear: this isn't a regression fix. It's a new performance optimization for the deprecated cgroup1 code. And it comes at the cost of higher memory footprint for both cgroup1 AND cgroup2. If this causes a regression, we should revert it again. But let's try.