Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp6443957rwb; Mon, 12 Dec 2022 01:49:20 -0800 (PST) X-Google-Smtp-Source: AA0mqf7Cz8u2qCtrTa7YCOlsd1H/ujehhre2TmItkRQ22uRUqjR5LdAlOsPxz/pSTHLS+Ub3IxDL X-Received: by 2002:a05:6402:f29:b0:46f:a2c2:405b with SMTP id i41-20020a0564020f2900b0046fa2c2405bmr61030eda.37.1670838560234; Mon, 12 Dec 2022 01:49:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670838560; cv=none; d=google.com; s=arc-20160816; b=rhh/CV+fYnKTB5lS9hPfvdKmLJMsZwuI3Q1S9UbU8a1I+vQO9cRx8XBJqVJJXrfr/E 3cmYG8kJfn+HSvGJCFmGXqdTzT6mES5WjbJyXzE9umIz+WtTwBFnE1fCjjR1CEY9MQZz Kl8vbCvg44pI+9RvJ4Js2T0zHaiLQUhVbAEdBbvmFBHzfEi8HGcYydVZHJaHcszfWsF8 QYBR4qYoYkoPEnW7WVD7EGITID7A1u6jI5Nt1nQsSCi9TvXKl8ZrBOUY3IswE9cDNb/L EBsNOIylTuk08diomz/jjMazxkaYx8ivCzIBviKnEEyNEElr+N4U6CohLk/FddXY7hgF ggnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=TGVOMJbI5VSuu2tCry8NVGy56zUSkKbZBynEIgKf3Ms=; b=cyCPFhzU6w4nAqPzq+J44UZx4FWyuKhjba7KWQ4q0v8RP5plrKa6aujTHheZU0CyTX QPN/NDC/jaZ+vtls+GVwbpaKE5pjKZRHgXkYlKojkAgdTsdBIE4UXDUAd7LlidIQl+Rq Q43mZQo7x8/omzzs2+wfJePyqsjIKLdL5Xyq0nDhdXFIeNYwbIqfj5RKJEI6DYhfAPEC dazbmgkfn8T/c+4dCqKRHWXPy6Cxb01ZMwuy8evO3jo5TKDKiAfPJAeVrRJnjiUg06WP 0pn0vhvSq7yPj4aqr+37M52w4fGtXMZs26Ebe0EnMxmI+QR483PyMF2Gu49nEstAugkb HA3A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=e7WlOo+K; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u10-20020aa7d98a000000b0045c3592301dsi6720709eds.191.2022.12.12.01.49.01; Mon, 12 Dec 2022 01:49:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=e7WlOo+K; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230504AbiLLI4B (ORCPT + 75 others); Mon, 12 Dec 2022 03:56:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34254 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229726AbiLLIz5 (ORCPT ); Mon, 12 Dec 2022 03:55:57 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F009C09; Mon, 12 Dec 2022 00:55:56 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id D6ACF338A2; Mon, 12 Dec 2022 08:55:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1670835354; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=TGVOMJbI5VSuu2tCry8NVGy56zUSkKbZBynEIgKf3Ms=; b=e7WlOo+Kk87BEbCBNwskXM7yzdzJaKGsFfe8oyqFNEfXIitKdo5y7ay8NCB+3SVFXff3cJ GR5vhrP/1xCsq6oCvRi6mNucyOzWGdMku6SHUqxKcvu0y7JCY81V/IY3luS2NOgA/C0S7+ YTMgtj+HVc6jHmGNR/MARafPRYQAziM= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B449913456; Mon, 12 Dec 2022 08:55:54 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id CQPRKZrslmMoGgAAMHmgww (envelope-from ); Mon, 12 Dec 2022 08:55:54 +0000 Date: Mon, 12 Dec 2022 09:55:54 +0100 From: Michal Hocko To: Mina Almasry Cc: Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Huang Ying , Yang Shi , Yosry Ahmed , weixugc@google.com, fvdl@google.com, bagasdotme@gmail.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v3] mm: Add nodes= arg to memory.reclaim Message-ID: References: <20221202223533.1785418-1-almasrymina@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221202223533.1785418-1-almasrymina@google.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 02-12-22 14:35:31, Mina Almasry wrote: > The nodes= arg instructs the kernel to only scan the given nodes for > proactive reclaim. For example use cases, consider a 2 tier memory system: > > nodes 0,1 -> top tier > nodes 2,3 -> second tier > > $ echo "1m nodes=0" > memory.reclaim > > This instructs the kernel to attempt to reclaim 1m memory from node 0. > Since node 0 is a top tier node, demotion will be attempted first. This > is useful to direct proactive reclaim to specific nodes that are under > pressure. > > $ echo "1m nodes=2,3" > memory.reclaim > > This instructs the kernel to attempt to reclaim 1m memory in the second tier, > since this tier of memory has no demotion targets the memory will be > reclaimed. > > $ echo "1m nodes=0,1" > memory.reclaim > > Instructs the kernel to reclaim memory from the top tier nodes, which can > be desirable according to the userspace policy if there is pressure on > the top tiers. Since these nodes have demotion targets, the kernel will > attempt demotion first. > > Since commit 3f1509c57b1b ("Revert "mm/vmscan: never demote for memcg > reclaim""), the proactive reclaim interface memory.reclaim does both > reclaim and demotion. Reclaim and demotion incur different latency costs > to the jobs in the cgroup. Demoted memory would still be addressable > by the userspace at a higher latency, but reclaimed memory would need to > incur a pagefault. > > The 'nodes' arg is useful to allow the userspace to control demotion > and reclaim independently according to its policy: if the memory.reclaim > is called on a node with demotion targets, it will attempt demotion first; > if it is called on a node without demotion targets, it will only attempt > reclaim. > > Acked-by: Michal Hocko > Signed-off-by: Mina Almasry After discussion in [1] I have realized that I haven't really thought through all the consequences of this patch and therefore I am retracting my ack here. I am not nacking the patch at this statge but I also think this shouldn't be merged now and we should really consider all the consequences. Let me summarize my main concerns here as well. The proposed implementation doesn't apply the provided nodemask to the whole reclaim process. This means that demotion can happen outside of the mask so the the user request cannot really control demotion targets and that limits the interface should there be any need for a finer grained control in the future (see an example in [2]). Another problem is that this can limit future reclaim extensions because of existing assumptions of the interface [3] - specify only top-tier node to force the aging without actually reclaiming any charges and (ab)use the interface only for aging on multi-tier system. A change to the reclaim to not demote in some cases could break this usecase. My counter proposal would be to define the nodemask for memory.reclaim as a domain to constrain the charge reclaim. That means both aging and reclaim including demotion which is a part of aging. This will allow to control where to demote for balancing purposes (e.g. demote to node 2 rather than 3) which is impossible with the proposed scheme. [1] http://lkml.kernel.org/r/20221206023406.3182800-1-almasrymina@google.com [2] http://lkml.kernel.org/r/Y5bnRtJ6sojtjgVD@dhcp22.suse.cz [3] http://lkml.kernel.org/r/CAAPL-u8rgW-JACKUT5ChmGSJiTDABcDRjNzW_QxMjCTk9zO4sg@mail.gmail.com -- Michal Hocko SUSE Labs