Received: by 2002:ab2:1689:0:b0:1f7:5705:b850 with SMTP id d9csp1816667lqa; Mon, 29 Apr 2024 23:07:17 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCX8tQHAfJ9R56yJCR2Z7T8pLAMoT9ZbCSeTd1tuGQLaHwxuC2+Tz3qfVvqQ+KLuI5jRuRqCygdBKXQDW7mH0judWzkbYJN9Cui0CAMSCQ== X-Google-Smtp-Source: AGHT+IF+TPEHl7458rXMwJXQNV4QIfWpX9ilvyVAg5VDbQQpt7R3lCTu3TS6D4qTRD59me9FGDNs X-Received: by 2002:a25:ae67:0:b0:dcb:cdce:3902 with SMTP id g39-20020a25ae67000000b00dcbcdce3902mr13384042ybe.55.1714457237297; Mon, 29 Apr 2024 23:07:17 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1714457237; cv=pass; d=google.com; s=arc-20160816; b=hg17uSFsxj3/OUxSpQ0VBeFhhehKpJ/mu0IyFQeRu67sq7gqvZ2oe1eVjBJ20NGR9v SM8HyXSb7lKDi1q+ml5QiZDxaxXbh0lDpTzg5g5C17CiUO0Zm6vkxKgpq+mRpz0DyBEp uCoKYNoXJ1CS0U8IgkNOBwRQjGhYKAWLaCdcHQL2grFFqT9R7ijC8kMC0UnK3ug93tn/ F6OPilhfPQZyBmKlYmx15D1DB7eMUz64xUuEuwRV+DT3ExsWU0eAPaRqjxSkzuyXHLmD dIJ8K9MAqSqchhLJUzkyOoqLrVmY2LgY7O6Myau6cuXxGzEYIhVo/V+B40qx5eGuSqsn Eqbg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=46jIpUDCGTb729LeCv+9t97jXtyDL66VCoduqMx+CMM=; fh=wnfKuHOUY4Mz1NUoc5XDTD1KV85duf1XGuItoJ8zV0s=; b=SOIWdHe//N1+gu/0aBbixx83rXI9mrhvEWaTyDbJzNab6t7KncMux66614EP8Y1H/l UF5472deVjd4TKW9lVdXhxbFGM6ovjSKEmXp481Msuh12fdTkzqoFo4hbvURoemzzOHa WCTWqE7ZcBoTOh8UR/Ll/spOoQPWduuD8am+9D9VefJPUoXWPxBqTKW3iXXXf+owvim4 xVboLJYenqk0cMbbk8CsE9SILv5oDPUPRA1a9R3mcYFSOHwymICd5J84fGnGMkZOqv4y G0SR5kRHWxgPDNxFdQUUSkoAN+nWQqzn+zzPpIIUfPHHCSN7052e05bA1Vbc9NoBWctG hfeg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=eT6UOJAv; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-163389-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-163389-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id p24-20020a05620a22f800b0079100dfcd69si2076847qki.629.2024.04.29.23.07.17 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 23:07:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-163389-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=eT6UOJAv; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-163389-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-163389-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id EF4AF1C2203A for ; Tue, 30 Apr 2024 06:07:16 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E928F2869A; Tue, 30 Apr 2024 06:06:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="eT6UOJAv" Received: from out-179.mta0.migadu.com (out-179.mta0.migadu.com [91.218.175.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EEAE1179BE for ; Tue, 30 Apr 2024 06:06:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.179 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714457196; cv=none; b=S3JVI6T/0LGGWSZW2MplN3nvdqIPt4RYJ2RehyFEAghB9eviXCvIJRC/YYUGqx/BNIZNb5/nmYNsFISj24n/7Nyc/w43vFdcLjg8KTJoVlKpBLEuLaybOuEao2TP3LyqqBcLKiXFyoNFE1uVRg45Gv5Iw4r1dHCTTrhSD5sPO6w= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714457196; c=relaxed/simple; bh=k18r5R/umXpyYhFtu0gNERJkRcu2wpGo4Q1J2bSt24k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dTsx83q3N4btew35yCTymm8SGPnuxj7kD4QSoigxXIHQ6poehk/K6BPmImpp7d0BwoBi77rnrs2FrxsAwLSRDU/LL888O4nDcWI3m6BseC7swfHozK2O7Kpa1V9el+ZPeqzw9XSWIRReI+itb7vCuCBi675eVYdUOOV/nFkFLHg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=eT6UOJAv; arc=none smtp.client-ip=91.218.175.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1714457193; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=46jIpUDCGTb729LeCv+9t97jXtyDL66VCoduqMx+CMM=; b=eT6UOJAv9lSCNtyA6T8DFFhJFSoH7G/OxSaLKTW2ca6EPrjCF4awZ5/G+GTTdGnfIt7dO/ gGZQHMBCXl1ffB5QTlzSqRSn06kHWxZkfxhUyvdzoXBxyPrBOndD3fBF9ArOJEv3kU3e6u 9yY3/ETZZnAG1RZYdU7lE+4P4vGERZI= From: Shakeel Butt To: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Yosry Ahmed , "T . J . Mercier" Cc: kernel-team@meta.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 4/8] memcg: reduce memory for the lruvec and memcg stats Date: Mon, 29 Apr 2024 23:06:08 -0700 Message-ID: <20240430060612.2171650-5-shakeel.butt@linux.dev> In-Reply-To: <20240430060612.2171650-1-shakeel.butt@linux.dev> References: <20240430060612.2171650-1-shakeel.butt@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT At the moment, the amount of memory allocated for stats related structs in the mem_cgroup corresponds to the size of enum node_stat_item. However not all fields in enum node_stat_item has corresponding memcg stats. So, let's use indirection mechanism similar to the one used for memcg vmstats management. For a given x86_64 config, the size of stats with and without patch is: structs size in bytes w/o with struct lruvec_stats 1128 648 struct lruvec_stats_percpu 752 432 struct memcg_vmstats 1832 1352 struct memcg_vmstats_percpu 1280 960 The memory savings is further compounded by the fact that these structs are allocated for each cpu and for each node. To be precise, for each memcg the memory saved would be: Memory saved = ((21 * 3 * NR_NODES) + (21 * 2 * NR_NODS * NR_CPUS) + (21 * 3) + (21 * 2 * NR_CPUS)) * sizeof(long) Where 21 is the number of fields eliminated. Signed-off-by: Shakeel Butt --- Changes since v2: - N/A mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 115 insertions(+), 23 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 434cff91b65e..f424c5b2ba9b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -576,35 +576,105 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) return mz; } +/* Subset of node_stat_item for memcg stats */ +static const unsigned int memcg_node_stat_items[] = { + NR_INACTIVE_ANON, + NR_ACTIVE_ANON, + NR_INACTIVE_FILE, + NR_ACTIVE_FILE, + NR_UNEVICTABLE, + NR_SLAB_RECLAIMABLE_B, + NR_SLAB_UNRECLAIMABLE_B, + WORKINGSET_REFAULT_ANON, + WORKINGSET_REFAULT_FILE, + WORKINGSET_ACTIVATE_ANON, + WORKINGSET_ACTIVATE_FILE, + WORKINGSET_RESTORE_ANON, + WORKINGSET_RESTORE_FILE, + WORKINGSET_NODERECLAIM, + NR_ANON_MAPPED, + NR_FILE_MAPPED, + NR_FILE_PAGES, + NR_FILE_DIRTY, + NR_WRITEBACK, + NR_SHMEM, + NR_SHMEM_THPS, + NR_FILE_THPS, + NR_ANON_THPS, + NR_KERNEL_STACK_KB, + NR_PAGETABLE, + NR_SECONDARY_PAGETABLE, +#ifdef CONFIG_SWAP + NR_SWAPCACHE, +#endif +}; + +static const unsigned int memcg_stat_items[] = { + MEMCG_SWAP, + MEMCG_SOCK, + MEMCG_PERCPU_B, + MEMCG_VMALLOC, + MEMCG_KMEM, + MEMCG_ZSWAP_B, + MEMCG_ZSWAPPED, +}; + +#define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items) +#define NR_MEMCG_STATS (NR_MEMCG_NODE_STAT_ITEMS + ARRAY_SIZE(memcg_stat_items)) +static int8_t mem_cgroup_stats_index[MEMCG_NR_STAT] __read_mostly; + +static void init_memcg_stats(void) +{ + int8_t i, j = 0; + + /* Switch to short once this failure occurs. */ + BUILD_BUG_ON(NR_MEMCG_STATS >= 127 /* INT8_MAX */); + + for (i = 0; i < NR_MEMCG_NODE_STAT_ITEMS; ++i) + mem_cgroup_stats_index[memcg_node_stat_items[i]] = ++j; + + for (i = 0; i < ARRAY_SIZE(memcg_stat_items); ++i) + mem_cgroup_stats_index[memcg_stat_items[i]] = ++j; +} + +static inline int memcg_stats_index(int idx) +{ + return mem_cgroup_stats_index[idx] - 1; +} + struct lruvec_stats_percpu { /* Local (CPU and cgroup) state */ - long state[NR_VM_NODE_STAT_ITEMS]; + long state[NR_MEMCG_NODE_STAT_ITEMS]; /* Delta calculation for lockless upward propagation */ - long state_prev[NR_VM_NODE_STAT_ITEMS]; + long state_prev[NR_MEMCG_NODE_STAT_ITEMS]; }; struct lruvec_stats { /* Aggregated (CPU and subtree) state */ - long state[NR_VM_NODE_STAT_ITEMS]; + long state[NR_MEMCG_NODE_STAT_ITEMS]; /* Non-hierarchical (CPU aggregated) state */ - long state_local[NR_VM_NODE_STAT_ITEMS]; + long state_local[NR_MEMCG_NODE_STAT_ITEMS]; /* Pending child counts during tree propagation */ - long state_pending[NR_VM_NODE_STAT_ITEMS]; + long state_pending[NR_MEMCG_NODE_STAT_ITEMS]; }; unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_item idx) { struct mem_cgroup_per_node *pn; - long x; + long x = 0; + int i; if (mem_cgroup_disabled()) return node_page_state(lruvec_pgdat(lruvec), idx); - pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); - x = READ_ONCE(pn->lruvec_stats->state[idx]); + i = memcg_stats_index(idx); + if (i >= 0) { + pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); + x = READ_ONCE(pn->lruvec_stats->state[i]); + } #ifdef CONFIG_SMP if (x < 0) x = 0; @@ -617,12 +687,16 @@ unsigned long lruvec_page_state_local(struct lruvec *lruvec, { struct mem_cgroup_per_node *pn; long x = 0; + int i; if (mem_cgroup_disabled()) return node_page_state(lruvec_pgdat(lruvec), idx); - pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); - x = READ_ONCE(pn->lruvec_stats->state_local[idx]); + i = memcg_stats_index(idx); + if (i >= 0) { + pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); + x = READ_ONCE(pn->lruvec_stats->state_local[i]); + } #ifdef CONFIG_SMP if (x < 0) x = 0; @@ -689,11 +763,11 @@ struct memcg_vmstats_percpu { /* The above should fit a single cacheline for memcg_rstat_updated() */ /* Local (CPU and cgroup) page state & events */ - long state[MEMCG_NR_STAT]; + long state[NR_MEMCG_STATS]; unsigned long events[NR_MEMCG_EVENTS]; /* Delta calculation for lockless upward propagation */ - long state_prev[MEMCG_NR_STAT]; + long state_prev[NR_MEMCG_STATS]; unsigned long events_prev[NR_MEMCG_EVENTS]; /* Cgroup1: threshold notifications & softlimit tree updates */ @@ -703,15 +777,15 @@ struct memcg_vmstats_percpu { struct memcg_vmstats { /* Aggregated (CPU and subtree) page state & events */ - long state[MEMCG_NR_STAT]; + long state[NR_MEMCG_STATS]; unsigned long events[NR_MEMCG_EVENTS]; /* Non-hierarchical (CPU aggregated) page state & events */ - long state_local[MEMCG_NR_STAT]; + long state_local[NR_MEMCG_STATS]; unsigned long events_local[NR_MEMCG_EVENTS]; /* Pending child counts during tree propagation */ - long state_pending[MEMCG_NR_STAT]; + long state_pending[NR_MEMCG_STATS]; unsigned long events_pending[NR_MEMCG_EVENTS]; /* Stats updates since the last flush */ @@ -844,7 +918,13 @@ static void flush_memcg_stats_dwork(struct work_struct *w) unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx) { - long x = READ_ONCE(memcg->vmstats->state[idx]); + long x; + int i = memcg_stats_index(idx); + + if (i < 0) + return 0; + + x = READ_ONCE(memcg->vmstats->state[i]); #ifdef CONFIG_SMP if (x < 0) x = 0; @@ -876,18 +956,25 @@ static int memcg_state_val_in_pages(int idx, int val) */ void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val) { - if (mem_cgroup_disabled()) + int i = memcg_stats_index(idx); + + if (mem_cgroup_disabled() || i < 0) return; - __this_cpu_add(memcg->vmstats_percpu->state[idx], val); + __this_cpu_add(memcg->vmstats_percpu->state[i], val); memcg_rstat_updated(memcg, memcg_state_val_in_pages(idx, val)); } /* idx can be of type enum memcg_stat_item or node_stat_item. */ static unsigned long memcg_page_state_local(struct mem_cgroup *memcg, int idx) { - long x = READ_ONCE(memcg->vmstats->state_local[idx]); + long x; + int i = memcg_stats_index(idx); + + if (i < 0) + return 0; + x = READ_ONCE(memcg->vmstats->state_local[i]); #ifdef CONFIG_SMP if (x < 0) x = 0; @@ -901,6 +988,10 @@ static void __mod_memcg_lruvec_state(struct lruvec *lruvec, { struct mem_cgroup_per_node *pn; struct mem_cgroup *memcg; + int i = memcg_stats_index(idx); + + if (i < 0) + return; pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); memcg = pn->memcg; @@ -930,10 +1021,10 @@ static void __mod_memcg_lruvec_state(struct lruvec *lruvec, } /* Update memcg */ - __this_cpu_add(memcg->vmstats_percpu->state[idx], val); + __this_cpu_add(memcg->vmstats_percpu->state[i], val); /* Update lruvec */ - __this_cpu_add(pn->lruvec_stats_percpu->state[idx], val); + __this_cpu_add(pn->lruvec_stats_percpu->state[i], val); memcg_rstat_updated(memcg, memcg_state_val_in_pages(idx, val)); memcg_stats_unlock(); @@ -5702,6 +5793,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) page_counter_init(&memcg->kmem, &parent->kmem); page_counter_init(&memcg->tcpmem, &parent->tcpmem); } else { + init_memcg_stats(); init_memcg_events(); page_counter_init(&memcg->memory, NULL); page_counter_init(&memcg->swap, NULL); @@ -5873,7 +5965,7 @@ static void mem_cgroup_css_rstat_flush(struct cgroup_subsys_state *css, int cpu) statc = per_cpu_ptr(memcg->vmstats_percpu, cpu); - for (i = 0; i < MEMCG_NR_STAT; i++) { + for (i = 0; i < NR_MEMCG_STATS; i++) { /* * Collect the aggregated propagation counts of groups * below us. We're in a per-cpu loop here and this is @@ -5937,7 +6029,7 @@ static void mem_cgroup_css_rstat_flush(struct cgroup_subsys_state *css, int cpu) lstatc = per_cpu_ptr(pn->lruvec_stats_percpu, cpu); - for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) { + for (i = 0; i < NR_MEMCG_NODE_STAT_ITEMS; i++) { delta = lstats->state_pending[i]; if (delta) lstats->state_pending[i] = 0; -- 2.43.0