Received: by 2002:ac8:678b:0:b0:405:464a:c27a with SMTP id b11csp15316qtp; Tue, 1 Aug 2023 12:15:38 -0700 (PDT) X-Google-Smtp-Source: APBJJlEjglPDXVBcud5ILC+akpx9QYyqGn3ddyJM6MbNZDNRnvt8wcqDttjg0WU8Tsq3+AVT/tgd X-Received: by 2002:a17:902:f7cb:b0:1b8:9b5e:6697 with SMTP id h11-20020a170902f7cb00b001b89b5e6697mr10874739plw.68.1690917338252; Tue, 01 Aug 2023 12:15:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690917338; cv=none; d=google.com; s=arc-20160816; b=Eoh14XRIHGZIQfWeCjarX+XJo4Ywgg0rfd8bw738FcPMb6/j0lL5LScQrUb66wZHW6 TPILqAU6ocXwLFV+ypaI+SmrBxQ38/VM8YNxhKfbYXuw+I3U4W73QCpMR62IXrNOk2Q+ DXZm3C0/VPgluEVt7JxeBuRueJlKTbvvsHjRw8W0YNemANHBx2QTROjlOD5QILK3FYpg gnzp/m/uvBOpbGRh6qSfnWhJ3Vuv4ea7VmyDce1XeQTZLUDOdygtWvdbKe8UFoWsClu8 t57XATJWAzIJihVcq7/83obahYw/VNACwGl7vehksCLKHDedxzGjBg7D/C6CTGePEfev YM/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=QWyPkbdgs7yx55emQXRqaXaZzIyQirdAz8BzArZ40eU=; fh=cEFMh3PNd9iVRXfXxHrsoz5nv09ZEIkAvHmeg3TI+FU=; b=lkDw56P0efwS/tLgzBKbdWXxoDUsGawulqCk+WGJP5XLkl/MLFtO/VMqQFWZ8B6NNF 1tKdM5XyZq7/qlanZyaPs+bD9nGmollkGOcFDSm18+Ox4lzxBkoJvwKD+jLDTFPwl49B s1/obmq1i121VTiplIhGQgMLRnWqxgBg1h41A7OB0eos+sI5rWHcR1zpjOfIuHoU3B+5 kHM031krjnK4YCONN+3P1EvLjanxS0y2liRmHwt1aSGub42eZ4qqXmnuhXzGVVYc4DZn 2A40/cJpFNybXBuuxh3QgIH44kaHXr2mXuNwi4ymGZhnASuJVoRAf2S07CpNc/aB+YwX FVKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=3YMahqT1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lb14-20020a170902fa4e00b001b896f8ae83si9335351plb.110.2023.08.01.12.15.22; Tue, 01 Aug 2023 12:15:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=3YMahqT1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233865AbjHARaU (ORCPT + 99 others); Tue, 1 Aug 2023 13:30:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231206AbjHARaT (ORCPT ); Tue, 1 Aug 2023 13:30:19 -0400 Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E0CAE53 for ; Tue, 1 Aug 2023 10:30:17 -0700 (PDT) Received: by mail-ed1-x534.google.com with SMTP id 4fb4d7f45d1cf-51bece5d935so8456048a12.1 for ; Tue, 01 Aug 2023 10:30:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1690911016; x=1691515816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=QWyPkbdgs7yx55emQXRqaXaZzIyQirdAz8BzArZ40eU=; b=3YMahqT1T68lBAKLGC6klWlHs6/yoQcVKMihs8hlR5o+bqg0M8L54pskZkt06Cdy/6 peUqd8125n6i4m5Ix0JkAVgL2Rl/GZMCGJ92U3RaWPEgl6QroNJ9Zb/UT25pvXvk6yVH EDEKpsRJSX91B50bjz3WQnRgEkZbHP+/LXhrCz90OWxSXXWtIxecGf2Xz+bAqo3EbQGF V/g+rrJv4LGWlojIM4vxR6lEiEupKHUdVqqiawBYFt51n1NovxS0llWPtb+8KKREyNCk wummIomryE/q6enBx1sCWQ6NZhLB3umXzBvp3no92lVinNnDKuii9Z9EcHMa+vIptRrb XU2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690911016; x=1691515816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QWyPkbdgs7yx55emQXRqaXaZzIyQirdAz8BzArZ40eU=; b=lcOcolMuPqxefqJXVXY2sQ93dEX/DvBn1OBk5KpwKDNyJ10cpmCvlS4Do3rSlbf5g2 0lMjgJCB0PavN9RBkmgzWA3KVYyfk1JtnAmh2BZ1ORkqK02AK3BLomaZE0CkmE485hsf tHc9BArIT5xUlyv/RHQam9hrs5m6iMJPHJgAebQ+YhDo1HOmOMYDGdKvPCv2uG5AEZWp TGbiJgkRruXSFbKhwOpU0gp+1XMJPmvzprOhA+rYDvFqDD/lIPGJkAeOTw855Lz2QLVl GhEzSwgS0kP5UzfDg1VwSAET46yF7kHA8PCviUTTfJC9M3DqPe3Vs2jlai7/LpF7zXfc qZbQ== X-Gm-Message-State: ABy/qLaBn02RqeTd8j2s0SWFxW0D1S6syQa2pyECL2SGPxVCJg2Pmp3S HTTWRb02ggfMcm8lcsS9pC0xYhi+DOx+ECvNRNw5Gw== X-Received: by 2002:a17:907:7751:b0:99c:281:9987 with SMTP id kx17-20020a170907775100b0099c02819987mr3341587ejc.36.1690911015665; Tue, 01 Aug 2023 10:30:15 -0700 (PDT) MIME-Version: 1.0 References: <20230726153223.821757-1-yosryahmed@google.com> <20230726153223.821757-2-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Tue, 1 Aug 2023 10:29:39 -0700 Message-ID: Subject: Re: [PATCH v3] mm: memcg: use rstat for non-hierarchical stats To: Michal Hocko Cc: Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 1, 2023 at 9:39=E2=80=AFAM Yosry Ahmed = wrote: > > On Tue, Aug 1, 2023 at 7:30=E2=80=AFAM Michal Hocko wro= te: > > > > On Wed 26-07-23 15:32:23, Yosry Ahmed wrote: > > > Currently, memcg uses rstat to maintain aggregated hierarchical stats= . > > > Counters are maintained for hierarchical stats at each memcg. Rstat > > > tracks which cgroups have updates on which cpus to keep those counter= s > > > fresh on the read-side. > > > > > > Non-hierarchical stats are currently not covered by rstat. Their > > > per-cpu counters are summed up on every read, which is expensive. > > > The original implementation did the same. At some point before rstat, > > > non-hierarchical aggregated counters were introduced by > > > commit a983b5ebee57 ("mm: memcontrol: fix excessive complexity in > > > memory.stat reporting"). However, those counters were updated on the > > > performance critical write-side, which caused regressions, so they we= re > > > later removed by commit 815744d75152 ("mm: memcontrol: don't batch > > > updates of local VM stats and events"). See [1] for more detailed > > > history. > > > > > > Kernel versions in between a983b5ebee57 & 815744d75152 (a year and a > > > half) enjoyed cheap reads of non-hierarchical stats, specifically on > > > cgroup v1. When moving to more recent kernels, a performance regressi= on > > > for reading non-hierarchical stats is observed. > > > > > > Now that we have rstat, we know exactly which percpu counters have > > > updates for each stat. We can maintain non-hierarchical counters agai= n, > > > making reads much more efficient, without affecting the performance > > > critical write-side. Hence, add non-hierarchical (i.e local) counters > > > for the stats, and extend rstat flushing to keep those up-to-date. > > > > > > A caveat is that we now need a stats flush before reading > > > local/non-hierarchical stats through {memcg/lruvec}_page_state_local(= ) > > > or memcg_events_local(), where we previously only needed a flush to > > > read hierarchical stats. Most contexts reading non-hierarchical stats > > > are already doing a flush, add a flush to the only missing context in > > > count_shadow_nodes(). > > > > > > With this patch, reading memory.stat from 1000 memcgs is 3x faster on= a > > > machine with 256 cpus on cgroup v1: > > > # for i in $(seq 1000); do mkdir /sys/fs/cgroup/memory/cg$i; done > > > # time cat /dev/cgroup/memory/cg*/memory.stat > /dev/null > > > real 0m0.125s > > > user 0m0.005s > > > sys 0m0.120s > > > > > > After: > > > real 0m0.032s > > > user 0m0.005s > > > sys 0m0.027s > > > > Have you measured any potential regression for cgroup v2 which collects > > all this data without ever using it (AFAICS)? > > I did not. I did not expect noticeable regressions given that all the > extra work is done during flushing, which should mostly be done by the > asynchronous worker, but can also happen in the stats reading context. > Let me run the same script on cgroup v2 just in case and report back. A few runs on mm-unstable with this patch: # time cat /sys/fs/cgroup/cg*/memory.stat > /dev/null real 0m0.020s user 0m0.005s sys 0m0.015s # time cat /sys/fs/cgroup/cg*/memory.stat > /dev/null real 0m0.017s user 0m0.005s sys 0m0.012s # time cat /sys/fs/cgroup/cg*/memory.stat > /dev/null real 0m0.016s user 0m0.004s sys 0m0.012s A few runs on mm-unstable with the patch reverted: # time cat /sys/fs/cgroup/cg*/memory.stat > /dev/null real 0m0.020s user 0m0.005s sys 0m0.015s # time cat /sys/fs/cgroup/cg*/memory.stat > /dev/null real 0m0.016s user 0m0.004s sys 0m0.012s # time cat /sys/fs/cgroup/cg*/memory.stat > /dev/null real 0m0.017s user 0m0.005s sys 0m0.012s It looks like there are no regressions on cgroup v2 when reading the stats. Please let me know if you want me to send a new version with the cgroup v2 results as well in the commit log -- or I can just send a new commit log. Whatever is easier for Andrew. > > > -- > > Michal Hocko > > SUSE Labs