Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp878430pxb; Fri, 22 Apr 2022 13:15:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy3DKdIw0K1i1M/OZ0Td5SFovBfZ89RVzZ6o5ijlOnWbgN+sdpclwulhKXkiElOdmQfbYIN X-Received: by 2002:a17:903:120c:b0:154:c135:60d3 with SMTP id l12-20020a170903120c00b00154c13560d3mr6318867plh.48.1650658554795; Fri, 22 Apr 2022 13:15:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650658554; cv=none; d=google.com; s=arc-20160816; b=SOVq2XMK6PNI/lued2ZlUl0RyeFWIYpl19zI0EOv282QvzNGjKjezWEe/4Azh/qyHR CzJdbkXXOVg5L0j+BHi8mS2zb8beF7/w0mWfkEX9D0PgYsGGcrTLKm96EazewZEh20M0 JXPK2vj5keYsn0GGLy5gH6dTtnWr7fHZWfQmWxKuXh6PuyxNUZM0ojK3D8k8bv+jKsP5 C3HdwcwNeT5rA+Asihz4scj9hA9sku6teyrNk2otN9yxxo0NvDMFH+JX1CPU+qVsGbGr Bf6H7DBBSr9TnTiP461aXRv9k4kduFxRgH0SgoJsJI+6ANTDmmbnW073Mw/bkNlWZU9h s+cw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=myPwHgFIyzUIOVbJwqldLFUO0VHEsUiZ6lNokwYFlGc=; b=R2kBRZz/IMCUsNqxxaLmm0nIb5Sdf8/BVFOWVqZPyumUj5+DXQHkzXMh0Jy2gAfnDj dIDDwVNvFtASbb1dI+tSNt+QkC8Lp8UoUnxoNh7pZqlhyE1zh/JgMcQCUPMAs/jr/SbG L0hXl/f49cCqgsA+l8mWRhLO/de4W8ZcaoQU04WQE7TOXe64trcFWMSC9/6xBFV94EOG iauhKZ0HgOacA5dC0LpZtNTRnTHpXDCbZdvEoML0Ex11QTkLX8IjR911D1WKZMHIMNr+ Xe3hmpLYSg2lZsfcVU27Sgb8Iu6SwJHUF/IGVeO7BgbwmNI/EdlnEl5vLEFc7PIMkAHn kPxg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=U8mWHLeP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id 80-20020a630653000000b003aac8d7dfffsi2471582pgg.782.2022.04.22.13.15.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Apr 2022 13:15:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=U8mWHLeP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B0CA128567F; Fri, 22 Apr 2022 12:17:26 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236818AbiDSAD7 (ORCPT + 99 others); Mon, 18 Apr 2022 20:03:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50050 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236772AbiDSADv (ORCPT ); Mon, 18 Apr 2022 20:03:51 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 01B082A70E for ; Mon, 18 Apr 2022 17:00:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650326456; x=1681862456; h=message-id:subject:from:to:cc:date:in-reply-to: references:mime-version:content-transfer-encoding; bh=slJKPvfkL1F/w5QJNOdWLawMAtPvZc5SwNSn9nJ5J/Q=; b=U8mWHLePtnfYAHuhxlzcvuV041IuevffnuUa1UuzbturPIVV0T9AblyC nmDIYiY0SGXau1hCCv+jNAr2/Ah8KoNwjL9k7RBNRQ6RO9JvFEOCFq3Fd sCk08FnJW31fwldWdjEmb6YSbkwZHzcLVor5cqFEA+1qA7Pug8dAksdrX ZH3Sf7Houo38VcVvWnepELJZLYLYfrkWeBuqi9q+XliDHfMtvBoawiw1A /HyOQ/03ABU+hH3pYO9ZU5q7Sd0bIiFpRvHLqwfq9YvshduetOCVDUl+a 9q4MA9diRF6rdbRAcOAA+kOmcBWTNwjl3V460l5cdwQ28hJA4zt6DXuZL w==; X-IronPort-AV: E=McAfee;i="6400,9594,10321"; a="288730760" X-IronPort-AV: E=Sophos;i="5.90,271,1643702400"; d="scan'208";a="288730760" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Apr 2022 17:00:56 -0700 X-IronPort-AV: E=Sophos;i="5.90,271,1643702400"; d="scan'208";a="561511613" Received: from schen9-mobl.amr.corp.intel.com ([10.209.117.29]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Apr 2022 17:00:55 -0700 Message-ID: Subject: Re: [PATCH 4/6] mm: introduce per-node proactive reclaim interface From: Tim Chen To: Davidlohr Bueso , linux-mm@kvack.org Cc: mhocko@kernel.org, akpm@linux-foundation.org, rientjes@google.com, yosryahmed@google.com, hannes@cmpxchg.org, shakeelb@google.com, dave.hansen@linux.intel.com, roman.gushchin@linux.dev, gthelen@google.com, a.manzanares@samsung.com, heekwon.p@samsung.com, gim.jongmin@samsung.com, linux-kernel@vger.kernel.org Date: Mon, 18 Apr 2022 17:00:55 -0700 In-Reply-To: <20220416053902.68517-5-dave@stgolabs.net> References: <20220416053902.68517-1-dave@stgolabs.net> <20220416053902.68517-5-dave@stgolabs.net> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.34.4 (3.34.4-1.fc31) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2022-04-15 at 22:39 -0700, Davidlohr Bueso wrote: > This patch introduces a mechanism to trigger memory reclaim > as a per-node sysfs interface, inspired by compaction's > equivalent; ie: > > echo 1G > /sys/devices/system/node/nodeX/reclaim > I think it will be more flexible to specify a node mask as a parameter along with amount of memory with the memory.reclaim memcg interface proposed by Yosry. Doing it node by node is more cumbersome. It is just a special case of reclaiming from root cgroup for a specific node. Wei Gu, YIng and I have some discssions on this https://lore.kernel.org/all/df6110a09cacc80ee1cbe905a71273a5f3953e16.camel@linux.intel.com/ Tim > It is based on the discussions from David's thread[1] as > well as the current upstreaming of the memcg[2] interface > (which has nice explanations for the benefits of userspace > reclaim overall). In both cases conclusions were that either > way of inducing proactive reclaim should be KISS, and can be > later extended. So this patch does not allow the user much > fine tuning beyond the size of the reclaim, such as anon/file > or whether or semantics of demotion. > > [1] https://lore.kernel.org/all/5df21376-7dd1-bf81-8414-32a73cea45dd@google.com/ > [2] https://lore.kernel.org/all/20220408045743.1432968-1-yosryahmed@google.com/ > > Signed-off-by: Davidlohr Bueso > --- > Documentation/ABI/stable/sysfs-devices-node | 10 ++++ > drivers/base/node.c | 2 + > include/linux/swap.h | 16 ++++++ > mm/vmscan.c | 59 +++++++++++++++++++++ > 4 files changed, 87 insertions(+) > > diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node > index 8db67aa472f1..3c935e1334f7 100644 > --- a/Documentation/ABI/stable/sysfs-devices-node > +++ b/Documentation/ABI/stable/sysfs-devices-node > @@ -182,3 +182,13 @@ Date: November 2021 > Contact: Jarkko Sakkinen > Description: > The total amount of SGX physical memory in bytes. > + > +What: /sys/devices/system/node/nodeX/reclaim > +Date: April 2022 > +Contact: Davidlohr Bueso > +Description: > + Write the amount of bytes to induce memory reclaim in this node. > + This file accepts a single key, the number of bytes to reclaim. > + When it completes successfully, the specified amount or more memory > + will have been reclaimed, and -EAGAIN if less bytes are reclaimed > + than the specified amount. > diff --git a/drivers/base/node.c b/drivers/base/node.c > index 6cdf25fd26c3..d80c478e2a6e 100644 > --- a/drivers/base/node.c > +++ b/drivers/base/node.c > @@ -670,6 +670,7 @@ static int register_node(struct node *node, int num) > > hugetlb_register_node(node); > compaction_register_node(node); > + reclaim_register_node(node); > return 0; > } > > @@ -685,6 +686,7 @@ void unregister_node(struct node *node) > hugetlb_unregister_node(node); /* no-op, if memoryless node */ > node_remove_accesses(node); > node_remove_caches(node); > + reclaim_unregister_node(node); > device_unregister(&node->dev); > } > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index 27093b477c5f..cca43ae6d770 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -398,6 +398,22 @@ extern unsigned long shrink_all_memory(unsigned long nr_pages); > extern int vm_swappiness; > long remove_mapping(struct address_space *mapping, struct folio *folio); > > +#if defined(CONFIG_SYSFS) && defined(CONFIG_NUMA) > +extern int reclaim_register_node(struct node *node); > +extern void reclaim_unregister_node(struct node *node); > + > +#else > + > +static inline int reclaim_register_node(struct node *node) > +{ > + return 0; > +} > + > +static inline void reclaim_unregister_node(struct node *node) > +{ > +} > +#endif /* CONFIG_SYSFS && CONFIG_NUMA */ > + > extern unsigned long reclaim_pages(struct list_head *page_list); > #ifdef CONFIG_NUMA > extern int node_reclaim_mode; > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 1735c302831c..3539f8a0f0ea 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -4819,3 +4819,62 @@ void check_move_unevictable_pages(struct pagevec *pvec) > } > } > EXPORT_SYMBOL_GPL(check_move_unevictable_pages); > + > +#if defined(CONFIG_SYSFS) && defined(CONFIG_NUMA) > +static ssize_t reclaim_store(struct device *dev, > + struct device_attribute *attr, > + const char *buf, size_t count) > +{ > + int err, nid = dev->id; > + gfp_t gfp_mask = GFP_KERNEL; > + struct pglist_data *pgdat = NODE_DATA(nid); > + unsigned long nr_to_reclaim, nr_reclaimed = 0; > + unsigned int nr_retries = MAX_RECLAIM_RETRIES; > + struct scan_control sc = { > + .gfp_mask = current_gfp_context(gfp_mask), > + .reclaim_idx = gfp_zone(gfp_mask), > + .priority = NODE_RECLAIM_PRIORITY, > + .may_writepage = !laptop_mode, > + .may_unmap = 1, > + .may_swap = 1, > + }; > + > + buf = strstrip((char *)buf); > + err = page_counter_memparse(buf, "", &nr_to_reclaim); > + if (err) > + return err; > + > + sc.nr_to_reclaim = max(nr_to_reclaim, SWAP_CLUSTER_MAX); > + > + while (nr_reclaimed < nr_to_reclaim) { > + unsigned long reclaimed; > + > + if (test_and_set_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags)) > + return -EAGAIN; > + > + /* does cond_resched() */ > + reclaimed = __node_reclaim(pgdat, gfp_mask, > + nr_to_reclaim - nr_reclaimed, &sc); > + > + clear_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags); > + > + if (!reclaimed && !nr_retries--) > + break; > + > + nr_reclaimed += reclaimed; > + } > + > + return nr_reclaimed < nr_to_reclaim ? -EAGAIN : count; > +} > + > +static DEVICE_ATTR_WO(reclaim); > +int reclaim_register_node(struct node *node) > +{ > + return device_create_file(&node->dev, &dev_attr_reclaim); > +} > + > +void reclaim_unregister_node(struct node *node) > +{ > + return device_remove_file(&node->dev, &dev_attr_reclaim); > +} > +#endif