Received: by 2002:a05:7412:8521:b0:e2:908c:2ebd with SMTP id t33csp237986rdf; Thu, 2 Nov 2023 21:28:10 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHHjepSXGLo2Id3Uh45WhIOXw5/A9/PNmyKC38QjYEQkMbH+xveDu+klCjuCdWadMLAnfRq X-Received: by 2002:a05:6a21:7742:b0:181:3dde:deeb with SMTP id bc2-20020a056a21774200b001813ddedeebmr7871090pzc.33.1698985690248; Thu, 02 Nov 2023 21:28:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698985690; cv=none; d=google.com; s=arc-20160816; b=wpUl23I2eMGFa3cAN5Lq6BoMcPb8Le96+DUQJCcEq0kdyHhJPlmg7BOjvOzGKpSXPv Wa5K9DBVVCXsJdv/+C8+HLgizZYAVhSdwXBdjvswlJYFMimmZEvyhh3+1GtOuUKU5jdM BpYuV6bBA5llLa6nGvAQ6psHzZA8tQUbjP8kvg6xDAWymAlAh1o+36G2oKHxUVfAshRj LV7VqaqX2sZ95vWcZkZXYtC/ftHY7eW8gm8mF+IzXcG7Qe7vKvLsExqp5Tw8anlE9Xyu oRsbpXenUFrlkXNmG7fYC2pzZeYi5iDh+Hx/7tYCDzM21xQF9lkWlYy8yLlvMGnzlk+I exww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=4NzF+QYdc4GZZ07M9FU4Y1CY8SWLRyCkROtQo9J/VZc=; fh=HCdLGv1rPz5E3U6Ph7sS6FubvZCrSxNwyaSHV2Norh4=; b=NJLmXKOQ5UBDG0CnpRyZkRn6YNXRS0O04epthvoTvgd3U9BAnzg1cMECXu2DH9aEC5 WIeMWFVLqChqyR5aDnjBuGZZVlI1PFZTRC6dZzR4YkKpEEtC37mXgx01qKNSTrBwQyK2 atFjJltjyo195HL9J2tzaYYpfi2MpkMSRrjEana/czhtK8TVuOE/peRqUXOWMBAYo6Zd g+KUowp2AjKiCM5Kcgbe4d7IjXFn+uKVI/j+W/P9abZTaGg2xkl6696/w8UgkogxD125 0SKIevynIU0J3JDGT7Ht9UZriWBjEMIZBZ2e2AzAsfZGjMiL14l0zlWToP8BbeQbZp/k ePiw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=QUJbhFVN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id e5-20020a17090301c500b001c71e907ee6si854139plh.124.2023.11.02.21.28.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Nov 2023 21:28:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=QUJbhFVN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id F3C7682F4E01; Thu, 2 Nov 2023 21:28:08 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229481AbjKCE2D (ORCPT + 99 others); Fri, 3 Nov 2023 00:28:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39244 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229436AbjKCE2C (ORCPT ); Fri, 3 Nov 2023 00:28:02 -0400 Received: from mail-lj1-x22c.google.com (mail-lj1-x22c.google.com [IPv6:2a00:1450:4864:20::22c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 304D018E for ; Thu, 2 Nov 2023 21:27:58 -0700 (PDT) Received: by mail-lj1-x22c.google.com with SMTP id 38308e7fff4ca-2c503da4fd6so22530141fa.1 for ; Thu, 02 Nov 2023 21:27:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698985676; x=1699590476; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=4NzF+QYdc4GZZ07M9FU4Y1CY8SWLRyCkROtQo9J/VZc=; b=QUJbhFVNHX0XDBjQIU6dAsf9LQUvy2ePwKhuvncoieeZBRbq5gomq5f0Oibqeyct5s ToBVWU1TdjvYpo6fcntafA3RTNpx5dg+bY12UeVL694tgEXipsN9gRYML0in3vhoOi6C +f1xg8COcgsDbcbpxSxJZdOJX/PQJ0rfM7zPreKbQZ3XYFmYdHJil+N7abj4mFyiZGBt Tl3o0a5RO6EXlz65tjLsjw+Gft1UN+fZ7KiJ59KZsQRcSOwkn+JDh3AyHJyCuvc2pwLZ Hdm6q4QR+5qDrJVyhVOOSYSiLwchCr/NG7OItMkUhWrHs+d+L1hgVpJjR5TmT9R06gP3 OQwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698985676; x=1699590476; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4NzF+QYdc4GZZ07M9FU4Y1CY8SWLRyCkROtQo9J/VZc=; b=QP8c7SldsylqYJ9skXW6EkK4b2qFYdndoNFZamM5Y7D6p/a/0cWNiz7kZvpxp3BLJc iw6NtPTk1OXKynFGeRdW/sp/JHAcI98c9dSK/yC6qcGfPK4u9Swet9TTm41WmYlPy8LU N5M7asyRBKwvkvX+g5poJty/1pMwEnYu3hTBRSvHcHP240hjhu/6ujSRjqT6DwETx2kM j7dpQsgncS9gTGnunPIFyXf1hmn8Mx2Af/RXASNHBUQBN+iGMK14wmGjs6nqshdSG1dd LVLcKvC1cycMIZorva7d/UpbsD3ePbOlhRDjOlncB5ySjdqXL83qVkw3FAldTB/QgyGn wLIw== X-Gm-Message-State: AOJu0Yy6KCJmE17kbJdFsZ/4NPG0CHjPRKbkWBE5iRMsFV4HNo+h8WpD oWV221nTJPjhedZMAhy+F90RjxHe9ydyhmS2vUDDOg== X-Received: by 2002:a2e:97d1:0:b0:2c5:2eaa:5397 with SMTP id m17-20020a2e97d1000000b002c52eaa5397mr14952675ljj.11.1698985676222; Thu, 02 Nov 2023 21:27:56 -0700 (PDT) MIME-Version: 1.0 References: <20231101230816.1459373-1-souravpanda@google.com> <20231101230816.1459373-2-souravpanda@google.com> <1e99ff39-b1cf-48b8-8b6d-ba5391e00db5@redhat.com> <025ef794-91a9-4f0c-9eb6-b0a4856fa10a@redhat.com> <99113dee-6d4d-4494-9eda-62b1faafdbae@redhat.com> In-Reply-To: From: Wei Xu Date: Thu, 2 Nov 2023 21:27:44 -0700 Message-ID: Subject: Re: [PATCH v5 1/1] mm: report per-page metadata information To: Pasha Tatashin Cc: David Hildenbrand , Sourav Panda , corbet@lwn.net, gregkh@linuxfoundation.org, rafael@kernel.org, akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, rppt@kernel.org, rdunlap@infradead.org, chenlinxuan@uniontech.com, yang.yang29@zte.com.cn, tomas.mudrunka@gmail.com, bhelgaas@google.com, ivan@cloudflare.com, yosryahmed@google.com, hannes@cmpxchg.org, shakeelb@google.com, kirill.shutemov@linux.intel.com, wangkefeng.wang@huawei.com, adobriyan@gmail.com, vbabka@suse.cz, Liam.Howlett@oracle.com, surenb@google.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org, Greg Thelen Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 02 Nov 2023 21:28:09 -0700 (PDT) On Thu, Nov 2, 2023 at 6:07=E2=80=AFPM Pasha Tatashin wrote: > > On Thu, Nov 2, 2023 at 4:22=E2=80=AFPM Wei Xu wrote: > > > > On Thu, Nov 2, 2023 at 11:34=E2=80=AFAM Pasha Tatashin > > wrote: > > > > > > > > > I could have sworn that I pointed that out in a previous versio= n and > > > > > > requested to document that special case in the patch descriptio= n. :) > > > > > > > > > > Sounds, good we will document that parts of per-page may not be p= art > > > > > of MemTotal. > > > > > > > > But this still doesn't answer how we can use the new PageMetadata > > > > field to help break down the runtime kernel overhead within MemUsed > > > > (MemTotal - MemFree). > > > > > > I am not sure it matters to the end users: they look at PageMetadata > > > with or without Page Owner, page_table_check, HugeTLB and it shows > > > exactly how much per-page overhead changed. Where the kernel allocate= d > > > that memory is not that important to the end user as long as that > > > memory became available to them. > > > > > > In addition, it is still possible to estimate the actual memblock par= t > > > of Per-page metadata by looking at /proc/zoneinfo: > > > > > > Memblock reserved per-page metadata: "present_pages - managed_pages" > > > > This assumes that all reserved memblocks are per-page metadata. As I > > Right after boot, when all Per-page metadata is still from memblocks, > we could determine what part of the zone reserved memory is not > per-page, and use it later in our calculations. > > > mentioned earlier, it is not a robust approach. > > > If there is something big that we will allocate in that range, we > > > should probably also export it in some form. > > > > > > If this field does not fit in /proc/meminfo due to not fully being > > > part of MemTotal, we could just keep it under nodeN/, as a separate > > > file, as suggested by Greg. > > > > > > However, I think it is useful enough to have an easy system wide view > > > for Per-page metadata. > > > > It is fine to have this as a separate, informational sysfs file under > > nodeN/, outside of meminfo. I just don't think as in the current > > implementation (where PageMetadata is a mixture of buddy and memblock > > allocations), it can help with the use case that motivates this > > change, i.e. to improve the breakdown of the kernel overhead. > > > > > > > are allocated), so what would be the best way to export page = metadata > > > > > > > without redefining MemTotal? Keep the new field in /proc/memi= nfo but > > > > > > > be ok that it is not part of MemTotal or do two counters? If = we do two > > > > > > > counters, we will still need to keep one that is a buddy allo= cator in > > > > > > > /proc/meminfo and the other one somewhere outside? > > > > > > > > > > > > > > I think the simplest thing to do now is to only report the buddy > > > > allocations of per-page metadata in meminfo. The meaning of the ne= w > > > > > > This will cause PageMetadata to be 0 on 99% of the systems, and > > > essentially become useless to the vast majority of users. > > > > I don't think it is a major issue. There are other fields (e.g. Zswap) > > in meminfo that remain 0 when the feature is not used. > > Since we are going to use two independent interfaces > /proc/meminfo/PageMetadata and nodeN/page_metadata (in a separate file > as requested by Greg) How about if in /proc/meminfo we provide only > the buddy allocator part, and in nodeN/page_metadata we provide the > total per-page overhead in the given node that include memblock > reserves, and buddy allocator memory? What we want is the system-wide breakdown of kernel memory usage. It works for this use case with the new PageMetadata counter in /proc/meminfo to report only buddy-allocated per-page metadata. > Pasha