Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp6407144rwb; Tue, 22 Nov 2022 12:58:47 -0800 (PST) X-Google-Smtp-Source: AA0mqf4pfTy4cv0QHerBJwjhy1l1XEGPOtu76w2w8xw2z2oQ2TVKsftcf49PT87Gxw7Gzb09lJh5 X-Received: by 2002:a17:902:7d94:b0:188:f4f4:2000 with SMTP id a20-20020a1709027d9400b00188f4f42000mr18813622plm.173.1669150727193; Tue, 22 Nov 2022 12:58:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669150727; cv=none; d=google.com; s=arc-20160816; b=ZjgHQjMyMIFwj7NNi6O5rA2xZ6RyNwyaUSFY/xyfehPL8XtpC+wuN3mMmy2E79rHKH Zuysyi4evEb+/biRGNHkznLODXr27Gss5qdbbsXJOG07fjruL3Z895ThVnP2Crt+ZMUj w7+vSFrTOnOr/fRe+xTbq6fKj8swnc5fIi07KJgtQxEffgfa351H+/wru2U0fi8k8iJc 8tCK22xCc5ylFV/vOqDAHjwQw8RokUHBVPD5X98XSVjnqX67A1vgb3+RjDHgwvMfVcYV NOHW9jGu3WreX2yDK14UjznbV4RsW/11mzl6sG5Ng1+pCLDBNm/fJWVaasefYlRVTsjF yKTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=BSER/B4DnSDEXzptnYVRvY0Bxd0pX06XJiUIMZfJs1c=; b=K7ayD1Xwu/RcxVbG5E0xbBkTvxucpwciirNmK6TPW8HNkeTmWrU0NuZpVBjptdgQSt xGPJaKfvsuNGsI19EaPHL0m/ig2nAB+tMBNrpxCJL0o7dcXLP6suxDsOi/PEEtkqoyij UZ8c4hBYnzZu9CoHNFUl988MZEYyCbkBdlB8nhcsn+l9azmM7hBPSSeGrKZ5ZgjoG+63 lKsjzsVPnD1Tnp3xHPpcYdLWc0SVz6P0wJYX5iOibQlnSr2zk92N4OYaiG3pVLJTkfwJ pThAa+FZphE8kJ9NxdmBLFSKW0/muzlfthAp3KEQf0vJRnbyX/u00F3WCMJF6t67MvsB E/fA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Q8CGtJa+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a1-20020a631a01000000b004499643a02esi15745526pga.248.2022.11.22.12.58.35; Tue, 22 Nov 2022 12:58:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Q8CGtJa+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234818AbiKVUj0 (ORCPT + 90 others); Tue, 22 Nov 2022 15:39:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234809AbiKVUjS (ORCPT ); Tue, 22 Nov 2022 15:39:18 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 90AF1697EA for ; Tue, 22 Nov 2022 12:39:16 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-352e29ff8c2so151581217b3.21 for ; Tue, 22 Nov 2022 12:39:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=BSER/B4DnSDEXzptnYVRvY0Bxd0pX06XJiUIMZfJs1c=; b=Q8CGtJa+O00lg9Tqtd2xc0jsOMGOiJJRQKDl+HbQ4+RJuR4NE+I2vLParTPRJyK9dV HeSYxODpIYH636a5cGLJjdX3SPGMLDqJjaFcJ39VsuEELq4DgDIZX//2wGqsFd51QSJ+ AZM/FTrtKnbmUqDMcYElZ3IbHtiXrin84UOmwRdMpGISPBEI8woUDxAgbVU4ym/RyNK7 VWuGFbedYwWGh89zIthlnJvT8GU/8qlI9jP4fZERGstjvqu1lGm4VzeB6c1hICq2esGq a6HaxwII49NO4LMfOWNuAy3MP55sIxndD9RIDojCzutzGpi9VdKps5EppZgBJUct3an9 k1NA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BSER/B4DnSDEXzptnYVRvY0Bxd0pX06XJiUIMZfJs1c=; b=aMS0hs/aCTOXXbbpCdmg9nn71/RBa5QRfveLYwMiEQnk5GijH0yvqHfw/rmaREvH7N JTge7WMlI1VprfmPDcIVaHiCGYgJJLCHhGEly5hzyhA40WgXufDgG75aJSjz3R1tj5kQ wTU4z2oBbkknUFmsLmdyCPWOrutbc6WozCL2vAXf+RsxzXxdo4gurcplQif425vA48Vr ouGm9vnzjQnNbjjqqSc7bzL31cC4riv4zZjItMdyGEGncdCq4vzZwpAQHU5K6hzuqWsO VMQSgumuIEuyiemv6ksdDvNOpi0pB3ENz21KkVxszsUBpr8ZRtLMv/o2znttBKtuLIOE pcNA== X-Gm-Message-State: ANoB5plmx14KtiGvtvLxCI94Ibs4LfZodTKqhjatB9shjkLuoyDMPIQK oPno04d9Vi/h+o7UlZRF7VTo8CWzNxZt1mv4KA== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2d4:203:b7d0:497d:abe4:3e6f]) (user=almasrymina job=sendgmr) by 2002:a25:7450:0:b0:6ee:8d5a:3bca with SMTP id p77-20020a257450000000b006ee8d5a3bcamr4305851ybc.300.1669149555882; Tue, 22 Nov 2022 12:39:15 -0800 (PST) Date: Tue, 22 Nov 2022 12:38:48 -0800 In-Reply-To: <20221122203850.2765015-1-almasrymina@google.com> Mime-Version: 1.0 References: <20221122203850.2765015-1-almasrymina@google.com> X-Mailer: git-send-email 2.38.1.584.g0f3c55d4c2-goog Message-ID: <20221122203850.2765015-4-almasrymina@google.com> Subject: [RFC PATCH v1 4/4] mm: Add nodes= arg to memory.demote From: Mina Almasry To: Huang Ying , Yang Shi , Yosry Ahmed , Tim Chen , weixugc@google.com, shakeelb@google.com, gthelen@google.com, fvdl@google.com, Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton Cc: Mina Almasry , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The nodes= arg instructs the kernel to only scan the given nodes for demotion. For example use cases, consider a 3 tier memory system: nodes 0,1 -> top tier nodes 2,3 -> second tier nodes 4,5 -> third tier echo "1m nodes=2,3" > memory.demote This instructs the kernel to attempt to demote 1m memory in the second tier to the third, which can be desirable according to the userspace policy if the second tier is filling up and there is available memory on the third tier. echo "1m" > memory.demote Instructs the kernel to attempt to demote 1m of memory (regardless of which tier the memory is currently on). echo "1m nodes=0,1" Instructs the kernel to demote memory from the top tier nodes, which can be desirable according to the userspace policy if there is pressure on the top tiers. Signed-off-by: Mina Almasry --- include/linux/swap.h | 3 ++- mm/memcontrol.c | 64 ++++++++++++++++++++++++++++++++++++-------- mm/vmscan.c | 4 ++- 3 files changed, 58 insertions(+), 13 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index f768171c2dc2..e195ee5f8efb 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -425,7 +425,8 @@ extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, - unsigned int reclaim_options); + unsigned int reclaim_options, + nodemask_t nodemask); extern unsigned long mem_cgroup_shrink_node(struct mem_cgroup *mem, gfp_t gfp_mask, bool noswap, pg_data_t *pgdat, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 427c79e467eb..cce446348358 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -63,6 +63,7 @@ #include #include #include +#include #include "internal.h" #include #include @@ -2392,7 +2393,8 @@ static unsigned long reclaim_high(struct mem_cgroup *memcg, psi_memstall_enter(&pflags); nr_reclaimed += try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, - MEMCG_RECLAIM_DEFAULT); + MEMCG_RECLAIM_DEFAULT, + NODE_MASK_ALL); psi_memstall_leave(&pflags); } while ((memcg = parent_mem_cgroup(memcg)) && !mem_cgroup_is_root(memcg)); @@ -2683,7 +2685,8 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, psi_memstall_enter(&pflags); nr_reclaimed = try_to_free_mem_cgroup_pages(mem_over_limit, nr_pages, - gfp_mask, reclaim_options); + gfp_mask, reclaim_options, + NODE_MASK_ALL); psi_memstall_leave(&pflags); if (mem_cgroup_margin(mem_over_limit) >= nr_pages) @@ -3504,7 +3507,8 @@ static int mem_cgroup_resize_max(struct mem_cgroup *memcg, if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, memsw ? MEMCG_RECLAIM_NO_SWAP : - MEMCG_RECLAIM_DEFAULT)) { + MEMCG_RECLAIM_DEFAULT, + NODE_MASK_ALL)) { ret = -EBUSY; break; } @@ -3615,7 +3619,8 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg) return -EINTR; if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, - MEMCG_RECLAIM_DEFAULT)) + MEMCG_RECLAIM_DEFAULT, + NODE_MASK_ALL)) nr_retries--; } @@ -6408,7 +6413,8 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, } reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_pages - high, - GFP_KERNEL, MEMCG_RECLAIM_DEFAULT); + GFP_KERNEL, MEMCG_RECLAIM_DEFAULT, + NODE_MASK_ALL); if (!reclaimed && !nr_retries--) break; @@ -6457,7 +6463,8 @@ static ssize_t memory_max_write(struct kernfs_open_file *of, if (nr_reclaims) { if (!try_to_free_mem_cgroup_pages(memcg, nr_pages - max, - GFP_KERNEL, MEMCG_RECLAIM_DEFAULT)) + GFP_KERNEL, MEMCG_RECLAIM_DEFAULT, + NODE_MASK_ALL)) nr_reclaims--; continue; } @@ -6612,7 +6619,8 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_to_reclaim - nr_reclaimed, - GFP_KERNEL, reclaim_options); + GFP_KERNEL, reclaim_options, + NODE_MASK_ALL); if (!reclaimed && !nr_retries--) return -EAGAIN; @@ -6623,6 +6631,16 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, return nbytes; } +enum { + MEMORY_DEMOTE_NODES = 0, + MEMORY_DEMOTE_NULL, +}; + +static const match_table_t if_tokens = { + { MEMORY_DEMOTE_NODES, "nodes=%s" }, + { MEMORY_DEMOTE_NULL, NULL }, +}; + static ssize_t memory_demote(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { @@ -6631,11 +6649,35 @@ static ssize_t memory_demote(struct kernfs_open_file *of, char *buf, unsigned long nr_to_demote, nr_demoted = 0; unsigned int reclaim_options = MEMCG_RECLAIM_ONLY_DEMOTE; int err; + char *old_buf, *start; + substring_t args[MAX_OPT_ARGS]; + int token; + char value[256]; + nodemask_t nodemask = NODE_MASK_ALL; buf = strstrip(buf); - err = page_counter_memparse(buf, "", &nr_to_demote); - if (err) - return err; + old_buf = buf; + nr_to_demote = memparse(buf, &buf) / PAGE_SIZE; + if (buf == old_buf) + return -EINVAL; + + buf = strstrip(buf); + + while ((start = strsep(&buf, " ")) != NULL) { + if (!strlen(start)) + continue; + token = match_token(start, if_tokens, args); + match_strlcpy(value, args, sizeof(value)); + switch (token) { + case MEMORY_DEMOTE_NODES: + err = nodelist_parse(value, nodemask); + if (err < 0) + return -EINVAL; + break; + default: + return -EINVAL; + } + } while (nr_demoted < nr_to_demote) { unsigned long demoted; @@ -6645,7 +6687,7 @@ static ssize_t memory_demote(struct kernfs_open_file *of, char *buf, demoted = try_to_free_mem_cgroup_pages( memcg, nr_to_demote - nr_demoted, GFP_KERNEL, - reclaim_options); + reclaim_options, nodemask); if (!demoted && !nr_retries--) return -EAGAIN; diff --git a/mm/vmscan.c b/mm/vmscan.c index d7e509b3f07f..df5ade259b3a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -6719,7 +6719,8 @@ unsigned long mem_cgroup_shrink_node(struct mem_cgroup *memcg, unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, - unsigned int reclaim_options) + unsigned int reclaim_options, + nodemask_t nodemask) { unsigned long nr_reclaimed; unsigned int noreclaim_flag; @@ -6734,6 +6735,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, .may_unmap = 1, .may_swap = !!(reclaim_options & MEMCG_RECLAIM_MAY_SWAP), .proactive = !!(reclaim_options & MEMCG_RECLAIM_PROACTIVE), + .nodemask = &nodemask, }; /* * Traverse the ZONELIST_FALLBACK zonelist of the current node to put -- 2.38.1.584.g0f3c55d4c2-goog