Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp6393969rwb; Mon, 12 Dec 2022 00:58:44 -0800 (PST) X-Google-Smtp-Source: AA0mqf7bEf/XdydM6/Bcn0KlS2/Fw2H8unxgvCfkEHuWtJn6EqNyS8azygTlRf8ZOrohJGe8Eu7X X-Received: by 2002:a17:90a:9c15:b0:218:8666:c20e with SMTP id h21-20020a17090a9c1500b002188666c20emr17761520pjp.20.1670835524494; Mon, 12 Dec 2022 00:58:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670835524; cv=none; d=google.com; s=arc-20160816; b=WokYikpnZ3Ji7H4Wz9UCufOEpURjnem4P31pmCsiAblZ9IbNVRz84Eq+zeM3btY6Gh YmqA91biRg7DcBMUPOiib8dERmE59GkT3MNTWrSLHToZC9tyH9WH5kOCwKrYfhhrJpVA NUqEQABEptvwpYqz031Rk4cONasQBsqyUQILwbwT3KymHM6Pt60ala8aqBSiXTEAg2a4 uMllQoHQJBCi/Pn7lYIXpSk3ngwSjOXO6wMnCuuv46tEujQcJztvqvbK2C3UoI/RPH7C QsLXCau4vLA+xhD7ldgqHian+cxcVOSYvZlvwr0QBiMxeMtnKJb/TF+CoLY18nGfIosV Yv3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=u46k2xaiUNLGH3vDTOujfb61ua4D+uOnX415TQsmtwE=; b=XdtQK1IyP2kbynWRaiV+CwWcMYuW1LjI1FgfPEvvkcWhn2Dl9Sro8VqtWnCzREW8Vh qtHu6rk0/WllhXQJh9bjDqMJI51TJ6dTZwYokrgFeMp/CauY7+YNE4yzJ/fHAS6Dcy6f 2sgeFT1IZAf9J3n9mb/xzWppVhCj8XS+YdTfF16gigXOMsOcVn95SqE4+AimLByTwFV2 oHNbx3R+4X8i3ipv3uEzE/pV7ds91ons59QOa7VUbJNf1bqlmczMcgzqar02zK2CoBZ3 QMY+fbs+x2CKzJ6BfIJL/mDeBsCRw89WSYXWwgoi8vssCnl9SNRTWgTEBqiPbJjt1n/B v/8w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=GB3dqhgs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z17-20020a17090ad79100b002130c156ee5si8686288pju.152.2022.12.12.00.58.34; Mon, 12 Dec 2022 00:58:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=GB3dqhgs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231567AbiLLIhE (ORCPT + 75 others); Mon, 12 Dec 2022 03:37:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231631AbiLLIgw (ORCPT ); Mon, 12 Dec 2022 03:36:52 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 40A20DFBB for ; Mon, 12 Dec 2022 00:36:51 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id DA68A1FE1F; Mon, 12 Dec 2022 08:36:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1670834209; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=u46k2xaiUNLGH3vDTOujfb61ua4D+uOnX415TQsmtwE=; b=GB3dqhgsRsmeLo2HX6JezYcgTd5KAOO+DTX1icxK06xXaZjxyZpX5KAGvZmPIV2/7bJQbb mu4hlkmZ/kRDYfVaAmIwG6vRJl3uc9Z5Fwf5iYyjZNAxXMBOjcIJmpCzfj7Pb57hjYaQ0U 3uu/K7ElW1cazqwxoTuzQ7QqMvNY9nY= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B1E5B138F3; Mon, 12 Dec 2022 08:36:49 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 5w2CKCHolmPnDgAAMHmgww (envelope-from ); Mon, 12 Dec 2022 08:36:49 +0000 Date: Mon, 12 Dec 2022 09:36:49 +0100 From: Michal Hocko To: Wei Xu Cc: Mina Almasry , Andrew Morton , Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song , Huang Ying , Yang Shi , Yosry Ahmed , fvdl@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3] [mm-unstable] mm: Fix memcg reclaim on memory tiered systems Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat 10-12-22 00:01:28, Wei Xu wrote: > On Fri, Dec 9, 2022 at 1:16 PM Michal Hocko wrote: > > > > On Fri 09-12-22 08:41:47, Wei Xu wrote: > > > On Fri, Dec 9, 2022 at 12:08 AM Michal Hocko wrote: > > > > > > > > On Thu 08-12-22 16:59:36, Wei Xu wrote: > > > > [...] > > > > > > What I really mean is to add demotion nodes to the nodemask along with > > > > > > the set of nodes you want to reclaim from. To me that sounds like a > > > > > > more natural interface allowing for all sorts of usecases: > > > > > > - free up demotion targets (only specify demotion nodes in the mask) > > > > > > - control where to demote (e.g. select specific demotion target(s)) > > > > > > - do not demote at all (skip demotion nodes from the node mask) > > > > > > > > > > For clarification, do you mean to add another argument (e.g. > > > > > demotion_nodes) in addition to the "nodes" argument? > > > > > > > > No, nodes=mask argument should control the domain where the memory > > > > reclaim should happen. That includes both aging and the reclaim. If the > > > > mask doesn't contain any lower tier node then no demotion will happen. > > > > If only a subset of lower tiers are specified then only those could be > > > > used for the demotion process. Or put it otherwise, the nodemask is not > > > > only used to filter out zonelists during reclaim it also restricts > > > > migration targets. > > > > > > > > Is this more clear now? > > > > > > In that case, how can we request demotion only from toptier nodes > > > (without counting any reclaimed bytes from other nodes), which is our > > > memory tiering use case? > > > > I am not sure I follow. Could you be more specific please? > > In our memory tiering use case, we would like to proactively free up > memory on top-tier nodes by demoting cold pages to lower-tier nodes. > This is to create enough free top-tier memory for new allocations and > promotions. How many pages and how often to demote from top-tier > nodes can depend on a number of factors (e.g. the amount of free > top-tier memory, the amount of cold pages, the bandwidth pressure on > lower-tier, the task tolerance of slower memory on performance) and > are controlled by the userspace policies. > > Because the purpose of such proactive demotions is to free up top-tier > memory, not to lower the amount of memory charged to the memcg, we'd > like that memory.reclaim can demote the specified amount of bytes from > the given top-tier nodes. If we have to also provide the lower-tier > nodes to memory.reclaim to allow demotions, the kernel can reclaim > from the lower-tier nodes in the same memory.reclaim request. We then > won't be able to control the amount of bytes to be demoted from > top-tier nodes. I am not sure this is something to be handled by the reclaim interface because now you are creating an ambiguity what the interface should do and start depend on it. Consider that we will change the reclaim algorithm in the future and the node you request to demote will simply reclaim rather than demote. This will break your usecase, right? -- Michal Hocko SUSE Labs