Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp9777224rwd; Wed, 21 Jun 2023 11:42:15 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4obKuM+KxfM3Ndl6+Ld/jykLmycEUZVdJaeLHM7YZWX0rkfZvIOpDXEtg1AU8KuMpBbbmX X-Received: by 2002:a17:903:2607:b0:1af:eb1c:80ba with SMTP id jd7-20020a170903260700b001afeb1c80bamr9735624plb.8.1687372935275; Wed, 21 Jun 2023 11:42:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687372935; cv=none; d=google.com; s=arc-20160816; b=hi7eguYMt2rXCBjx08GwBnofqvM0qaTbCqHf1TMFuB1ovZyUglWIY3IEyGBoJQXTbK Ig3gCfxBVdj4ejwS0onpxr9IRIL0n5DaiNtNDK9h5R0rI1z5AEOv3T8c0lbUY0E1nRBM fqq90YtdcWJrXYYCLfIDkC41O4o7KPljdb8OxRHx/OYSTG6VA3fCx5tPmkHYlt3DIE9f VAfwAAwlvAaV6mYMxfaGLXRKrE25fww+fbiDiYo5cvawsK3JFng1YDhjzrEInlJzXLUV NjMpfl24n7161hdeAjKtZdU4wnYFDMcGNNV4+qzmp3kw1aPTg4ZvsvpFTLUc9Hn5CgLP SyYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=P8vYzFHxLWdhm/YYBuM4cir+l3MkexuRd1yB7xK/qZo=; b=FzW3r/FYAuRkWdsQ+PxXzwCU1Ab8jQWz+dmtwZ3kiJri29Si4rzHhLY0zlRSQ4MBA1 WtB8OH9r4pLNVNWjF6YoTxdb3Po2g99S9hXMlcbNwoNdv5iab/cbJXEp1JTIyz4khoIW nIaPlBVVdFULNQiRjPfXQ01c78y7XKX3FC8Hd6+z4+HVx/67owMkbkquIVA/iXwZVNAX zCYU1HC+4PlPQq/DBC0Hqm9Zn3Sx39WwtaZo423sG0f15bmBKh0hrlZ7cEw03LBKZ8uM DDm7Q78cOzhZ3bJqSq5cUX0Vpt2xsWINRfvCLqk5+bIsjty8DGgRyyGxAKmd7EBTPVct QsaQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=TwWoi+pZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s12-20020a170902ea0c00b001b508dd9a84si5254174plg.422.2023.06.21.11.42.03; Wed, 21 Jun 2023 11:42:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=TwWoi+pZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231723AbjFUSR1 (ORCPT + 99 others); Wed, 21 Jun 2023 14:17:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45276 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231562AbjFUSRM (ORCPT ); Wed, 21 Jun 2023 14:17:12 -0400 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1176319A2 for ; Wed, 21 Jun 2023 11:17:11 -0700 (PDT) Received: by mail-pj1-x104a.google.com with SMTP id 98e67ed59e1d1-25eccff0a51so3839298a91.0 for ; Wed, 21 Jun 2023 11:17:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687371430; x=1689963430; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=P8vYzFHxLWdhm/YYBuM4cir+l3MkexuRd1yB7xK/qZo=; b=TwWoi+pZl5v8X1NfXYAXDFTjFQRH0gRf8kcFXat6SAjcaa5QAAlB1haHO0q77RAinF D7xdtDa76dKs0GBJ34FBnPIsbh4xyimY5wrW1f6snuskuEcZJCOzknkqQ824BwEme3MM E6lxh0I4tvcYJRHOS1/KATWGHFctdscLKfObeEhRzcu1IiMQFS9ynJo1hdgECo0P8Gpt Xor2kFxaCjpsZ3henaaUXtkntEKa7iivsPZyOrHfOsxsId1bV6FY8Ohj1/LpkWsG+GdH d7oCOnaC7hgluf1VkKFS52Jq6dvfFstI19Xby8uEeRklXtHAMo72yh/opONH6YPZCAbt StMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687371430; x=1689963430; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=P8vYzFHxLWdhm/YYBuM4cir+l3MkexuRd1yB7xK/qZo=; b=dMR9ebuhYT7nBpULJyAlqdHSqj3+XYTozxyly/GtbmNU1d16GW4FinyH5XHFe26GMA iXl58N8pVVEsurCMxyXRHIMI1eLL2xFwXBKoqAJErj+49e706NKN0JrnnVE0Sd0iIuJD ID/SWziIHjo6O0X68f4rO0DrsqMw1zMV8CmWmtyq+uaXYrD+hSd3UyHpMzIwlw7iQq+B fkIKsLp02ZdxliqiEJrXvofWd+13oZcJ2DhjI8gXkd+ankHlY7hF+1f6yF2xVzGP9FeO uILbKcy8MrbY51sN8jS2yydAmkHJC+DnO31Hdt3m3EsPbK+H6skk8o5aTEoeOFiw2fCd Y6qw== X-Gm-Message-State: AC+VfDyuFNqmbppcfw5hjtAK/zjNx8HgYIiJa68Zl4dxw3t31AuAZpG0 Xkj+zUQBdFG5SriSS0A8p94rETpX11NM X-Received: from yuanchu.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:4024]) (user=yuanchu job=sendgmr) by 2002:a17:90a:cb06:b0:25b:f973:f944 with SMTP id z6-20020a17090acb0600b0025bf973f944mr2138645pjt.3.1687371430482; Wed, 21 Jun 2023 11:17:10 -0700 (PDT) Date: Wed, 21 Jun 2023 18:04:52 +0000 In-Reply-To: <20230621180454.973862-1-yuanchu@google.com> Mime-Version: 1.0 References: <20230621180454.973862-1-yuanchu@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Message-ID: <20230621180454.973862-5-yuanchu@google.com> Subject: [RFC PATCH v2 4/6] mm: extend working set reporting to memcgs From: Yuanchu Xie To: Greg Kroah-Hartman , "Rafael J . Wysocki" , "Michael S . Tsirkin" , David Hildenbrand , Jason Wang , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Yu Zhao , Kefeng Wang , Kairui Song , Yosry Ahmed , Yuanchu Xie , "T . J . Alumbaugh" Cc: Wei Xu , SeongJae Park , Sudarshan Rajagopalan , kai.huang@intel.com, hch@lst.de, jon@nutanix.com, Aneesh Kumar K V , Matthew Wilcox , Vasily Averin , linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-mm@kvack.org, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Break down the system-wide working set reporting into per-memcg reports, which aggregages its children hierarchically. The per-node working set reporting histograms and refresh/report threshold files are presented as memcg files, showing a report containing all the nodes. Signed-off-by: T.J. Alumbaugh Signed-off-by: Yuanchu Xie --- include/linux/memcontrol.h | 6 + include/linux/wsr.h | 4 + mm/memcontrol.c | 262 ++++++++++++++++++++++++++++++++++++- mm/vmscan.c | 9 +- 4 files changed, 277 insertions(+), 4 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 85dc9b88ea379..96971aa6a48cd 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -10,6 +10,7 @@ #ifndef _LINUX_MEMCONTROL_H #define _LINUX_MEMCONTROL_H +#include #include #include #include @@ -325,6 +326,11 @@ struct mem_cgroup { struct lru_gen_mm_list mm_list; #endif +#ifdef CONFIG_WSR + int wsr_event; + wait_queue_head_t wsr_wait_queue; +#endif + struct mem_cgroup_per_node *nodeinfo[]; }; diff --git a/include/linux/wsr.h b/include/linux/wsr.h index 85c901ce026b9..d45f7cc0672ac 100644 --- a/include/linux/wsr.h +++ b/include/linux/wsr.h @@ -48,6 +48,7 @@ ssize_t wsr_intervals_ms_parse(char *src, struct ws_bin *bins); */ void wsr_refresh(struct wsr *wsr, struct mem_cgroup *root, struct pglist_data *pgdat); +void report_ws(struct pglist_data *pgdat, struct scan_control *sc); #else struct ws_bin; struct wsr; @@ -73,6 +74,9 @@ static inline void wsr_refresh(struct wsr *wsr, struct mem_cgroup *root, struct pglist_data *pgdat) { } +static inline void report_ws(struct pglist_data *pgdat, struct scan_control *sc) +{ +} #endif /* CONFIG_WSR */ #endif /* _LINUX_WSR_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2eee092f8f119..edf5bb31bb19c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -25,6 +25,7 @@ * Copyright (C) 2020 Alibaba, Inc, Alex Shi */ +#include #include #include #include @@ -65,6 +66,7 @@ #include #include "internal.h" #include +#include #include #include "slab.h" #include "swap.h" @@ -5233,6 +5235,7 @@ static void free_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) if (!pn) return; + wsr_destroy(&pn->lruvec); free_percpu(pn->lruvec_stats_percpu); kfree(pn); } @@ -5311,6 +5314,10 @@ static struct mem_cgroup *mem_cgroup_alloc(void) spin_lock_init(&memcg->deferred_split_queue.split_queue_lock); INIT_LIST_HEAD(&memcg->deferred_split_queue.split_queue); memcg->deferred_split_queue.split_queue_len = 0; +#endif +#ifdef CONFIG_WSR + memcg->wsr_event = 0; + init_waitqueue_head(&memcg->wsr_wait_queue); #endif idr_replace(&mem_cgroup_idr, memcg, memcg->id.id); lru_gen_init_memcg(memcg); @@ -5411,6 +5418,11 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) } spin_unlock_irq(&memcg->event_list_lock); +#ifdef CONFIG_WSR + wake_up_pollfree(&memcg->wsr_wait_queue); + synchronize_rcu(); +#endif + page_counter_set_min(&memcg->memory, 0); page_counter_set_low(&memcg->memory, 0); @@ -6642,6 +6654,228 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, return nbytes; } +#ifdef CONFIG_WSR +static int memory_wsr_intervals_ms_show(struct seq_file *m, void *v) +{ + int nid; + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + for_each_node_state(nid, N_MEMORY) { + struct wsr *wsr; + struct ws_bin *bin; + + wsr = lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + mutex_lock(&wsr->bins_lock); + seq_printf(m, "N%d=", nid); + for (bin = wsr->bins; bin->idle_age != -1; bin++) + seq_printf(m, "%u,", jiffies_to_msecs(bin->idle_age)); + mutex_unlock(&wsr->bins_lock); + + seq_printf(m, "%lld ", LLONG_MAX); + } + seq_putc(m, '\n'); + + return 0; +} + +static ssize_t memory_wsr_intervals_ms_parse(struct kernfs_open_file *of, + char *buf, size_t nbytes, + unsigned int *nid_out, + struct ws_bin *bins) +{ + char *node, *intervals; + unsigned int nid; + int err; + + buf = strstrip(buf); + intervals = buf; + node = strsep(&intervals, "="); + + if (*node != 'N') + return -EINVAL; + + err = kstrtouint(node + 1, 0, &nid); + if (err) + return err; + + if (nid >= nr_node_ids || !node_state(nid, N_MEMORY)) + return -EINVAL; + + err = wsr_intervals_ms_parse(intervals, bins); + if (err) + return err; + + *nid_out = nid; + return 0; +} + +static ssize_t memory_wsr_intervals_ms_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, + loff_t off) +{ + unsigned int nid; + int err; + struct wsr *wsr; + struct ws_bin *bins; + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + + bins = kzalloc(sizeof(wsr->bins), GFP_KERNEL); + if (!bins) + return -ENOMEM; + + err = memory_wsr_intervals_ms_parse(of, buf, nbytes, &nid, bins); + if (err) + goto failed; + + wsr = lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + mutex_lock(&wsr->bins_lock); + memcpy(wsr->bins, bins, sizeof(wsr->bins)); + mutex_unlock(&wsr->bins_lock); +failed: + kfree(bins); + return err ?: nbytes; +} + +static int memory_wsr_refresh_ms_show(struct seq_file *m, void *v) +{ + int nid; + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + for_each_node_state(nid, N_MEMORY) { + struct wsr *wsr = + lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + + seq_printf(m, "N%d=%u ", nid, + jiffies_to_msecs(READ_ONCE(wsr->refresh_threshold))); + } + seq_putc(m, '\n'); + + return 0; +} + +static ssize_t memory_wsr_threshold_parse(char *buf, size_t nbytes, + unsigned int *nid_out, + unsigned int *msecs) +{ + char *node, *threshold; + unsigned int nid; + int err; + + buf = strstrip(buf); + threshold = buf; + node = strsep(&threshold, "="); + + if (*node != 'N') + return -EINVAL; + + err = kstrtouint(node + 1, 0, &nid); + if (err) + return err; + + if (nid >= nr_node_ids || !node_state(nid, N_MEMORY)) + return -EINVAL; + + err = kstrtouint(threshold, 0, msecs); + if (err) + return err; + + *nid_out = nid; + + return nbytes; +} + +static ssize_t memory_wsr_refresh_ms_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + unsigned int nid, msecs; + struct wsr *wsr; + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + ssize_t ret = memory_wsr_threshold_parse(buf, nbytes, &nid, &msecs); + + if (ret < 0) + return ret; + + wsr = lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + WRITE_ONCE(wsr->refresh_threshold, msecs_to_jiffies(msecs)); + return ret; +} + +static int memory_wsr_report_ms_show(struct seq_file *m, void *v) +{ + int nid; + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + for_each_node_state(nid, N_MEMORY) { + struct wsr *wsr = + lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + + seq_printf(m, "N%d=%u ", nid, + jiffies_to_msecs(READ_ONCE(wsr->report_threshold))); + } + seq_putc(m, '\n'); + + return 0; +} + +static ssize_t memory_wsr_report_ms_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + unsigned int nid, msecs; + struct wsr *wsr; + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + ssize_t ret = memory_wsr_threshold_parse(buf, nbytes, &nid, &msecs); + + if (ret < 0) + return ret; + + wsr = lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + WRITE_ONCE(wsr->report_threshold, msecs_to_jiffies(msecs)); + return ret; +} + +static int memory_wsr_histogram_show(struct seq_file *m, void *v) +{ + int nid; + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + for_each_node_state(nid, N_MEMORY) { + struct wsr *wsr = + lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + struct ws_bin *bin; + + seq_printf(m, "N%d\n", nid); + + mutex_lock(&wsr->bins_lock); + wsr_refresh(wsr, memcg, NODE_DATA(nid)); + for (bin = wsr->bins; bin->idle_age != -1; bin++) + seq_printf(m, "%u anon=%lu file=%lu\n", + jiffies_to_msecs(bin->idle_age), + bin->nr_pages[0], bin->nr_pages[1]); + + seq_printf(m, "%lld anon=%lu file=%lu\n", LLONG_MAX, + bin->nr_pages[0], bin->nr_pages[1]); + + mutex_unlock(&wsr->bins_lock); + } + + return 0; +} + +__poll_t memory_wsr_histogram_poll(struct kernfs_open_file *of, + struct poll_table_struct *pt) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + + if (memcg->css.flags & CSS_DYING) + return DEFAULT_POLLMASK; + + poll_wait(of->file, &memcg->wsr_wait_queue, pt); + if (cmpxchg(&memcg->wsr_event, 1, 0) == 1) + return DEFAULT_POLLMASK | EPOLLPRI; + return DEFAULT_POLLMASK; +} +#endif + static struct cftype memory_files[] = { { .name = "current", @@ -6710,7 +6944,33 @@ static struct cftype memory_files[] = { .flags = CFTYPE_NS_DELEGATABLE, .write = memory_reclaim, }, - { } /* terminate */ +#ifdef CONFIG_WSR + { + .name = "wsr.intervals_ms", + .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .seq_show = memory_wsr_intervals_ms_show, + .write = memory_wsr_intervals_ms_write, + }, + { + .name = "wsr.refresh_ms", + .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .seq_show = memory_wsr_refresh_ms_show, + .write = memory_wsr_refresh_ms_write, + }, + { + .name = "wsr.report_ms", + .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .seq_show = memory_wsr_report_ms_show, + .write = memory_wsr_report_ms_write, + }, + { + .name = "wsr.histogram", + .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .seq_show = memory_wsr_histogram_show, + .poll = memory_wsr_histogram_poll, + }, +#endif + {} /* terminate */ }; struct cgroup_subsys memory_cgrp_subsys = { diff --git a/mm/vmscan.c b/mm/vmscan.c index c56fddcec88fb..ba254b6e91e19 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4559,8 +4559,6 @@ static bool age_lruvec(struct lruvec *lruvec, struct scan_control *sc, unsigned return true; } -static void report_ws(struct pglist_data *pgdat, struct scan_control *sc); - /* to protect the working set of the last N jiffies */ static unsigned long lru_gen_min_ttl __read_mostly; @@ -5937,7 +5935,7 @@ void wsr_refresh(struct wsr *wsr, struct mem_cgroup *root, } } -static void report_ws(struct pglist_data *pgdat, struct scan_control *sc) +void report_ws(struct pglist_data *pgdat, struct scan_control *sc) { static DEFINE_RATELIMIT_STATE(rate, HZ, 3); @@ -5969,6 +5967,8 @@ static void report_ws(struct pglist_data *pgdat, struct scan_control *sc) if (wsr->notifier) kernfs_notify(wsr->notifier); + if (memcg && cmpxchg(&memcg->wsr_event, 0, 1) == 0) + wake_up_interruptible(&memcg->wsr_wait_queue); } #endif /* CONFIG_WSR */ @@ -6486,6 +6486,9 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc) if (zone->zone_pgdat == last_pgdat) continue; last_pgdat = zone->zone_pgdat; + + if (!sc->proactive) + report_ws(zone->zone_pgdat, sc); shrink_node(zone->zone_pgdat, sc); } -- 2.41.0.162.gfafddb0af9-goog