Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp79098pxk; Tue, 15 Sep 2020 21:18:45 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxSBcv12kUOQ5N6WZPUqWDu4pThisNMH7GgAk2dpwUtgJhZ9lpbTTABKTPF9FULpm5X9k1Y X-Received: by 2002:a17:906:7116:: with SMTP id x22mr24197844ejj.426.1600229925660; Tue, 15 Sep 2020 21:18:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600229925; cv=none; d=google.com; s=arc-20160816; b=O/pZw6PxJrJ5ziDYoSBa0cY/s3hy5P27y2klCpZPQWcc8yPwG2sqLUIv4qYaFWmZM1 jXMAbOsBe6zArDm+53EwE+SkDknulMXa1M8e7lMoE4Azdpv5D/vSSlwbBjf5HUGtZrRM XdJsvTqyn/wXZAp5v3Z2+gZ5Cy2uaQOTAYtqJFamXOb/O7CeeNgrg2JJSSiw0VtDIWkP x8hhdYQf9XQXa4dT/oHW0V40h7XwVzrpvyGdrFtw/MbyFeelm6jp8XTCFyzP4qV3fK1s KvHuk/hST75ibInfoRcdl0foKywKJHFc4cqGZl/imql5ukYMz8z1sN7tFax81bKLOxmE oZ7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=LCRAOABNyfnd7lzn4QMxDV96zQMAtn58NYDWlvquY3g=; b=v6rHu17vtpRF+qFVDuuDu6NdrTnAZtsC2YxzKRG/j5t34HXPf9ecL795zpUn9gDExY 9gCKw121MajG9NLlzd1HwmDJVzU/wR4trspIyH4/2nJMbv8UjBQ5tyLl67X0GVX/ljH3 p1ijLyxvNY1pOqdU5y+Jtq1/16IhRVKXRZ1twm8sAaAFeR86RGiH1IxzxNOrAmKM5dNR Qj/kTYe1KR32DkzLKmjlQEaVVz06Ya+Ex/Kq4goSzLt3kHCE+dSs3yzsW1uFXX9ppWrV Nye4d5yE1ZeoSXUZI9PKj2lcJcsbZa+hraX+CB4oxqJLAWc+OgfYo+uslDu7zQiCGOhx i4UQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=2G6cIT8y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id nz21si10416705ejb.654.2020.09.15.21.18.22; Tue, 15 Sep 2020 21:18:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=2G6cIT8y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726149AbgIPEP2 (ORCPT + 99 others); Wed, 16 Sep 2020 00:15:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50174 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726129AbgIPEP0 (ORCPT ); Wed, 16 Sep 2020 00:15:26 -0400 Received: from mail-pl1-x642.google.com (mail-pl1-x642.google.com [IPv6:2607:f8b0:4864:20::642]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B31CC061788 for ; Tue, 15 Sep 2020 21:15:26 -0700 (PDT) Received: by mail-pl1-x642.google.com with SMTP id e4so2481354pln.10 for ; Tue, 15 Sep 2020 21:15:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=LCRAOABNyfnd7lzn4QMxDV96zQMAtn58NYDWlvquY3g=; b=2G6cIT8yZsbFxrYcihmm01Gj0GSrMKff8Czby6u+EPsC5Y13yzYr4IjiO33Hbh6W/S 3XcK0ykjiUDsKYjgLk5yq6aEZlY8UOw/EMR+GXEHF5B4qga8TW5wh71rA8wRUI+O1GpF JRF8qYyUOP2vwBL0t5/skhaS8XNKzuppA5BzeZPbp+3zARCR1zbOWN7Jt4HgGiOYEoBS qyqW/6uNQxmXZke8BhbUXEvptwlrBP5RMTLKs2xagCaon+uPVYO0G1YiJ2lliVZpRdQt EoLdWPztPztdRe7hxe6Qy/FfIlXNwnh1ij5YFpXRg9gnDnrr+tYWs2UU4DIu88wynRFK qWFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=LCRAOABNyfnd7lzn4QMxDV96zQMAtn58NYDWlvquY3g=; b=i6K9fODjpXantgpbl9XAgBRHQc+rPqFYoCCqFydi8ARPIg2S/3NRIZKlmozI/Uzqf7 REtcFhHJTiSfiwe5gxI7kCIyOSwqiNrgUfGEYupnriJ1NDoZCXXqbyhuvZzVLmjsQoUr jPGcQfQF1C+NxwuI4B1pYZ2S8e+IQyrNRO9gD6TolGVSLJbvfmvHcEaRahEtGQuivqTS ZCrVeLqyKOgjoeZDKmKvP9MApgXzIlO2+eLEgNfLZbKvIR0TZKvgDaTZbPheNevvairf Y5p1Sn+kGY3B9gWRfARWp0BqZTgURkL5w37EwpsjSA0XEUVyiqk+ayB83FWwmwiQVlIg oVJg== X-Gm-Message-State: AOAM531w/7fIpOEb9REEqP48XhHHp3oruOnI0yHzqT4Uf/eTCPNh57rH URrOMo2V/zzOXWKWdwKT+toK8fdcuh8lwZFKcWM0vw== X-Received: by 2002:a17:90a:b78b:: with SMTP id m11mr2289570pjr.13.1600229725682; Tue, 15 Sep 2020 21:15:25 -0700 (PDT) MIME-Version: 1.0 References: <20200915171801.39761-1-songmuchun@bytedance.com> <20200915214845.GB189808@cmpxchg.org> In-Reply-To: <20200915214845.GB189808@cmpxchg.org> From: Muchun Song Date: Wed, 16 Sep 2020 12:14:49 +0800 Message-ID: Subject: Re: [External] Re: [PATCH v5] mm: memcontrol: Add the missing numa_stat interface for cgroup v2 To: Johannes Weiner Cc: Tejun Heo , Zefan Li , Jonathan Corbet , Michal Hocko , Vladimir Davydov , Andrew Morton , Shakeel Butt , Roman Gushchin , Randy Dunlap , Cgroups , linux-doc@vger.kernel.org, LKML , Linux Memory Management List Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 16, 2020 at 5:50 AM Johannes Weiner wrote: > > On Wed, Sep 16, 2020 at 01:18:01AM +0800, Muchun Song wrote: > > In the cgroup v1, we have a numa_stat interface. This is useful for > > providing visibility into the numa locality information within an > > memcg since the pages are allowed to be allocated from any physical > > node. One of the use cases is evaluating application performance by > > combining this information with the application's CPU allocation. > > But the cgroup v2 does not. So this patch adds the missing information. > > > > Signed-off-by: Muchun Song > > Suggested-by: Shakeel Butt > > Reviewed-by: Shakeel Butt > > Yup, that would be useful information to have. Just a few comments on > the patch below: > > > @@ -1368,6 +1368,78 @@ PAGE_SIZE multiple when read back. > > collapsing an existing range of pages. This counter is not > > present when CONFIG_TRANSPARENT_HUGEPAGE is not set. > > > > + memory.numa_stat > > + A read-only flat-keyed file which exists on non-root cgroups. > > It's a nested key file, not flat. This is just copied from memory.stat documentation.Is the memory.stat also a nested key file? > > > + This breaks down the cgroup's memory footprint into different > > + types of memory, type-specific details, and other information > > + per node on the state of the memory management system. > > + > > + This is useful for providing visibility into the NUMA locality > > + information within an memcg since the pages are allowed to be > > + allocated from any physical node. One of the use case is evaluating > > + application performance by combining this information with the > > + application's CPU allocation. > > + > > + All memory amounts are in bytes. > > + > > + The output format of memory.numa_stat is:: > > + > > + type N0= N1= ... > > + > > + The entries are ordered to be human readable, and new entries > > + can show up in the middle. Don't rely on items remaining in a > > + fixed position; use the keys to look up specific values! > > + > > + anon > > + Amount of memory per node used in anonymous mappings such > > + as brk(), sbrk(), and mmap(MAP_ANONYMOUS). > > + > > + file > > + Amount of memory per node used to cache filesystem data, > > + including tmpfs and shared memory. > > + > > + kernel_stack > > + Amount of memory per node allocated to kernel stacks. > > + > > + shmem > > + Amount of cached filesystem data per node that is swap-backed, > > + such as tmpfs, shm segments, shared anonymous mmap()s. > > + > > + file_mapped > > + Amount of cached filesystem data per node mapped with mmap(). > > + > > + file_dirty > > + Amount of cached filesystem data per node that was modified but > > + not yet written back to disk. > > + > > + file_writeback > > + Amount of cached filesystem data per node that was modified and > > + is currently being written back to disk. > > + > > + anon_thp > > + Amount of memory per node used in anonymous mappings backed by > > + transparent hugepages. > > + > > + inactive_anon, active_anon, inactive_file, active_file, unevictable > > + Amount of memory, swap-backed and filesystem-backed, > > + per node on the internal memory management lists used > > + by the page reclaim algorithm. > > + > > + As these represent internal list state (e.g. shmem pages are on > > + anon memory management lists), inactive_foo + active_foo may not > > + be equal to the value for the foo counter, since the foo counter > > + is type-based, not list-based. > > + > > + slab_reclaimable > > + Amount of memory per node used for storing in-kernel data > > + structures which might be reclaimed, such as dentries and > > + inodes. > > + > > + slab_unreclaimable > > + Amount of memory per node used for storing in-kernel data > > + structures which cannot be reclaimed on memory pressure. > > + > > memory.swap.current > > A read-only single value file which exists on non-root > > cgroups. > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 75cd1a1e66c8..ff919ef3b57b 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -6425,6 +6425,86 @@ static int memory_stat_show(struct seq_file *m, void *v) > > return 0; > > } > > > > +#ifdef CONFIG_NUMA > > +struct numa_stat { > > + const char *name; > > + unsigned int ratio; > > + enum node_stat_item idx; > > +}; > > + > > +static struct numa_stat numa_stats[] = { > > + { "anon", PAGE_SIZE, NR_ANON_MAPPED }, > > + { "file", PAGE_SIZE, NR_FILE_PAGES }, > > + { "kernel_stack", 1024, NR_KERNEL_STACK_KB }, > > + { "shmem", PAGE_SIZE, NR_SHMEM }, > > + { "file_mapped", PAGE_SIZE, NR_FILE_MAPPED }, > > + { "file_dirty", PAGE_SIZE, NR_FILE_DIRTY }, > > + { "file_writeback", PAGE_SIZE, NR_WRITEBACK }, > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > > + /* > > + * The ratio will be initialized in numa_stats_init(). Because > > + * on some architectures, the macro of HPAGE_PMD_SIZE is not > > + * constant(e.g. powerpc). > > + */ > > + { "anon_thp", 0, NR_ANON_THPS }, > > +#endif > > + { "inactive_anon", PAGE_SIZE, NR_INACTIVE_ANON }, > > + { "active_anon", PAGE_SIZE, NR_ACTIVE_ANON }, > > + { "inactive_file", PAGE_SIZE, NR_INACTIVE_FILE }, > > + { "active_file", PAGE_SIZE, NR_ACTIVE_FILE }, > > + { "unevictable", PAGE_SIZE, NR_UNEVICTABLE }, > > + { "slab_reclaimable", 1, NR_SLAB_RECLAIMABLE_B }, > > + { "slab_unreclaimable", 1, NR_SLAB_UNRECLAIMABLE_B }, > > +}; > > This is a bit duplicative with memory_stat_format(), and the > collections will easily go out of sync as we add/change stat items. > > Can you please convert memory_stat_format() to use the same shared table? > > You may have to add another flag for the MEMCG_* items for which we > don't have per-node counters. > > The same applies to the documentation. Please don't duplicate the list > of items, but have memory.numa_stat refer to the list for memory.stat. > You can add (not in memory.numa_stat) or something to percpu and sock. Thanks for your suggestions. > > > +static unsigned long memcg_node_page_state(struct mem_cgroup *memcg, > > + unsigned int nid, > > + enum node_stat_item idx) > > +{ > > + VM_BUG_ON(nid >= nr_node_ids); > > + return lruvec_page_state(mem_cgroup_lruvec(memcg, NODE_DATA(nid)), idx); > > +} > > Please drop this wrapper and use lruvec_page_state directly below. > > Otherwise, this looks reasonable to me. OK. Will do that. -- Yours, Muchun