Received: by 2002:a05:6358:c692:b0:131:369:b2a3 with SMTP id fe18csp5976262rwb; Tue, 1 Aug 2023 10:22:52 -0700 (PDT) X-Google-Smtp-Source: APBJJlH1viUQQGEGna9/aMTy/4YhgUjbP7JPutqDHWhm4rdRb9YoEsm7MVl7laMPlEoM0qcv8vJM X-Received: by 2002:a05:6870:6127:b0:1b0:189c:87a0 with SMTP id s39-20020a056870612700b001b0189c87a0mr15598033oae.41.1690910572093; Tue, 01 Aug 2023 10:22:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690910572; cv=none; d=google.com; s=arc-20160816; b=frX7sziIXy3gGQsGe8AxCiOv4SnQO6t1D1uer0QZ7yCi7wmGGURhwEPWctsC8HXhK/ eB87hyGNo2PN9pgBXhAircRRoLSGL90Wx2SkuRKfOdfLbL9KRhgzrAWkTnfD88uDCXve ZApf+MAQU71Z70HiScP3l8Sx2CUyWswN4/9NdOupXpCtb2nk3+6f0OHm7Inr55+YdYum Z3ToyZbpO+8V9vu3q347SGt/lrMPVNluET5i7vqTeiBLrDnaCOv3BZvnJHEfWB5jVhdd n1AL8IGWXZV6I/8a3ICVPYo/ti0EO8bDlE11sFReLkc8cLg/ggJ1I7Qpgp9RXeXJbnn5 mS8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=4qjFUw6+lEzkCCKqkbY3T91pufwJOps9NprX6UVn+h8=; fh=cEFMh3PNd9iVRXfXxHrsoz5nv09ZEIkAvHmeg3TI+FU=; b=ElkVT/f95nWM/BZHXIN3zY8AUSyG2nv0vetx8YphrvRGJMRe7rqsn93EF1E7eBkAcL ZNDn4XYTEWoaH1fu51G69AoVveon+yJ/xdI9tOIi1X2EjoKzwyISvRWiwXkQluypQe1k MNYF9KYCCIzR7cjDZDjzLmHbIrvzxgILj8aw+fTaM94ISW1Udc7kaT+oBe3KXj0VLSxP 1vOb0Yhh7Q4YzvgbavOK1ioUW0mi1wddezODtdkXVaWqg+JcK5aI4N+WskFGD8MKjT2Z uH0St+ddBTCGq30lcrIIIxBcEC8d2BXVRBBAB9N6ROaX5d8+StueOOZR4LruL+0jHjCN 1AVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=p6fiXJVO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ne7-20020a17090b374700b00263d1729212si2488064pjb.85.2023.08.01.10.22.37; Tue, 01 Aug 2023 10:22:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=p6fiXJVO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231540AbjHAQkS (ORCPT + 99 others); Tue, 1 Aug 2023 12:40:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33814 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231607AbjHAQkQ (ORCPT ); Tue, 1 Aug 2023 12:40:16 -0400 Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com [IPv6:2a00:1450:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C11B11FFE for ; Tue, 1 Aug 2023 09:40:14 -0700 (PDT) Received: by mail-ej1-x62f.google.com with SMTP id a640c23a62f3a-98dfb3f9af6so922661566b.2 for ; Tue, 01 Aug 2023 09:40:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1690908013; x=1691512813; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=4qjFUw6+lEzkCCKqkbY3T91pufwJOps9NprX6UVn+h8=; b=p6fiXJVO6VAT7ddDlyeNeKo0u9g75XBshHTDYQkWpxGEUOGoOVAa7lHgsGvQZ5v/+W OL8o5krg+sbCk3k/ILPv5kJcGtc7Hl7e/Ptk68GNbUYDFS1ZLOXSdMFbtMgISIwiBvXM xA5PCRA4fUyBY1hNJB6x57RpX4pHwvLO70MzF7vtOTGutOLMa2G02e6PwMjrWoV32oEH b6YlMDU2PYjQiFSB63POIJeR3bgDccie71bMuCrG1M0DF1oPPZLzqfczUjMBmkdzgXbl zrQDN3zGYffW49VbgVfFPFSdFMhkF99msYQ0hnralZcPwwvzI8CmOTzVDqdh+UFOFJjq Lm5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690908013; x=1691512813; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4qjFUw6+lEzkCCKqkbY3T91pufwJOps9NprX6UVn+h8=; b=bJmIMYUb1bQk+Juj0e4+LDujPHeTFX7s/mEF8gj8kY3x/v8BPrESwvdhyEgkJZkL1w wDDJUO65CAvGfdNgEjFLASWcVW7VuO/Yam99VNrOeZ9f/ri2hXx2smOCeM8Wa+o9biew EPwKzjqFg02tWbpD1fKboflLxhg/QPrjlnsl42TXsk/qDEDFsTen1jCFjeJdQRa1bysS GlEJO4iZoSYN+hSpRfFBD9/nqNXMDk1o3rv42ftCcc2Jgv7DfO+zt0csrIsPeGn3C4E2 sSpT7VWBK4Xct8fG9zrayq3hxeJGsnnP1TN8uSchCl3VEgf2NRMl8QsRkXhrhH9309n8 6XGw== X-Gm-Message-State: ABy/qLYB3C+Fh/9hzcF6MRH3f8jMgUvJJGBYsZvbO7DVhHjcmvAMXOuG 2wlQ94w1C69BxgMo3EpBJ1or6dta1VboQxemyJXxXw== X-Received: by 2002:a17:907:75d0:b0:989:450:e567 with SMTP id jl16-20020a17090775d000b009890450e567mr2977498ejc.65.1690908012991; Tue, 01 Aug 2023 09:40:12 -0700 (PDT) MIME-Version: 1.0 References: <20230726153223.821757-1-yosryahmed@google.com> <20230726153223.821757-2-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Tue, 1 Aug 2023 09:39:36 -0700 Message-ID: Subject: Re: [PATCH v3] mm: memcg: use rstat for non-hierarchical stats To: Michal Hocko Cc: Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 1, 2023 at 7:30=E2=80=AFAM Michal Hocko wrote= : > > On Wed 26-07-23 15:32:23, Yosry Ahmed wrote: > > Currently, memcg uses rstat to maintain aggregated hierarchical stats. > > Counters are maintained for hierarchical stats at each memcg. Rstat > > tracks which cgroups have updates on which cpus to keep those counters > > fresh on the read-side. > > > > Non-hierarchical stats are currently not covered by rstat. Their > > per-cpu counters are summed up on every read, which is expensive. > > The original implementation did the same. At some point before rstat, > > non-hierarchical aggregated counters were introduced by > > commit a983b5ebee57 ("mm: memcontrol: fix excessive complexity in > > memory.stat reporting"). However, those counters were updated on the > > performance critical write-side, which caused regressions, so they were > > later removed by commit 815744d75152 ("mm: memcontrol: don't batch > > updates of local VM stats and events"). See [1] for more detailed > > history. > > > > Kernel versions in between a983b5ebee57 & 815744d75152 (a year and a > > half) enjoyed cheap reads of non-hierarchical stats, specifically on > > cgroup v1. When moving to more recent kernels, a performance regression > > for reading non-hierarchical stats is observed. > > > > Now that we have rstat, we know exactly which percpu counters have > > updates for each stat. We can maintain non-hierarchical counters again, > > making reads much more efficient, without affecting the performance > > critical write-side. Hence, add non-hierarchical (i.e local) counters > > for the stats, and extend rstat flushing to keep those up-to-date. > > > > A caveat is that we now need a stats flush before reading > > local/non-hierarchical stats through {memcg/lruvec}_page_state_local() > > or memcg_events_local(), where we previously only needed a flush to > > read hierarchical stats. Most contexts reading non-hierarchical stats > > are already doing a flush, add a flush to the only missing context in > > count_shadow_nodes(). > > > > With this patch, reading memory.stat from 1000 memcgs is 3x faster on a > > machine with 256 cpus on cgroup v1: > > # for i in $(seq 1000); do mkdir /sys/fs/cgroup/memory/cg$i; done > > # time cat /dev/cgroup/memory/cg*/memory.stat > /dev/null > > real 0m0.125s > > user 0m0.005s > > sys 0m0.120s > > > > After: > > real 0m0.032s > > user 0m0.005s > > sys 0m0.027s > > Have you measured any potential regression for cgroup v2 which collects > all this data without ever using it (AFAICS)? I did not. I did not expect noticeable regressions given that all the extra work is done during flushing, which should mostly be done by the asynchronous worker, but can also happen in the stats reading context. Let me run the same script on cgroup v2 just in case and report back. > -- > Michal Hocko > SUSE Labs