Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp8610809rwb; Tue, 13 Dec 2022 08:20:27 -0800 (PST) X-Google-Smtp-Source: AA0mqf6yIxDDrxw7ADOU6TWzkImTST5wayD7/vSZ9K1PDButt2oa/njlJ8eJT9K6cW14MXz/uSmm X-Received: by 2002:a17:907:76d4:b0:7c0:e5c8:d439 with SMTP id kf20-20020a17090776d400b007c0e5c8d439mr18687933ejc.3.1670948427143; Tue, 13 Dec 2022 08:20:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670948427; cv=none; d=google.com; s=arc-20160816; b=IsowyZzy6aqwAZ+FfIui2hvwqOobJ2jJcO4mMIPaBod7gq5O1Uh7Sn3+h1CFglNFDB QqwYzj6Xqpff5QDJow4es/dx9DcPmOM32yZ7RPiBDdc5oD6mwMtR8fakg+FFaZRhf0KE eOKrKwgUkbTs6+/olIa1cIeyQJGbk6xCr5ZqpIUitUAQ1bCLMaukkB3VMzt/mWiqIKWL YVipM42Ow5kfFthCDaJ6hdmg+aPjOEOckhhsBL+r2VUu7fEYWA5uXpgGfCZIyppZOz5k l92idQHsUMlYnHl43HEUMMgPZ70bLfyEv925eMlqYEQGSU2GdJGxiCnRmec//1AEItZD ImaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=BmoHPjxBfuv0L4/bABJBapiPnoh74ZtmvdJi1uPpypQ=; b=qoebV7jCYoECtl2iwuYM74urJEj7DNb3KCkyF/doOdFbM+L69VOTXXwBBc9ow1qnsG SHSgEcnGJg3+zTgR+HNUlZIHM1tOhIQP7oo7hy2UsOQfNn+zn/36AhiKlWqf7xEfB5vg OxedlPax8ajufeoN54lAByE74lko3BaP4zLynHtvowB3rmjUN4WFXkbdQLCEVKjFFJaI cyk8/IvcwsNzXuiWUoWbtlEjKv72Wd5GAPa6Z2+9rovPZeAsTyedzi9wV83Y0tZ7mKAQ JvykTZX7wwpkuUkAL+RCQj5G+/OIhZIaqmu+gXb8hQ8SQO+PKo+ZioxbXMzPovIsZfH3 CdUA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=tYSHyL3u; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v11-20020a17090606cb00b007adcc8fb7a0si6991533ejb.399.2022.12.13.08.20.09; Tue, 13 Dec 2022 08:20:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=tYSHyL3u; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234899AbiLMP7H (ORCPT + 72 others); Tue, 13 Dec 2022 10:59:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235919AbiLMP7A (ORCPT ); Tue, 13 Dec 2022 10:59:00 -0500 Received: from mail-qv1-xf36.google.com (mail-qv1-xf36.google.com [IPv6:2607:f8b0:4864:20::f36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9FA801121 for ; Tue, 13 Dec 2022 07:58:56 -0800 (PST) Received: by mail-qv1-xf36.google.com with SMTP id d2so10689334qvp.12 for ; Tue, 13 Dec 2022 07:58:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=BmoHPjxBfuv0L4/bABJBapiPnoh74ZtmvdJi1uPpypQ=; b=tYSHyL3uT8BsXSP/ddwQ0c6U4nhdyNjgWj87JFBf3wKrv2/u1Rhvnbh0tcm5zQCdx+ Z3m6/3BDj8GObhrJtaTfOHOsj6JxliVkHsNt/y0O1l9MVpjQIn91+Sy3NrwwQeqcfZlQ S2PfH5LtOg/HYacxDBleObOg3/CePMpki5SMd7deB8e96YqDEQIAuxEAcxD2M+UZJxtA QqHXZx0xJUBuB9rXKD3WnIIY7Wo9FVQuTVEQnC4OudC5oSNZSw/P1OO6bf97dDABhUhK XjMDuG/ga6AuWOT2s47vBi+LPy3BfEM7s61d4FNSk/qXm+kDo6TDYl+nyQ5a4+92oU8M hROg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=BmoHPjxBfuv0L4/bABJBapiPnoh74ZtmvdJi1uPpypQ=; b=5v8n03POn6pxXGeUufnujGd+JNzU49AV+Jqvcme0Kiz+SE0eu6rva9hgC98Kl51K06 N5eBTvmYb55wCNxAKI0gmXJKb5gwMsUuCLo3uLQGtQBf8ut1T2bBwxj0hpDH/TAY4cno qWqUUl8IgI22PdJr8meEb2JjkJ2EqCHB9Ftr6wct4XwL5bTPwBdd3i8O8U69CziyDbki O6iuT4CjKVvL5yAYKfe4QNpa3nXsVLe9MIWKqBrOOQD6f08YT7904chKvquVD6cMCcFw lqS6Wb9xI4XCIPr3xbJ52pieluc82FupLsKUUP08ciWJuL0TKJHHc4bD9Ch7JDhE8md/ iEDA== X-Gm-Message-State: ANoB5pmy4RveIQ3DY/CcfyX9wcPeddnr9BjM6L7XF/F8BuhDnASzqXBG mI+v4dGD7gWihahYIddjAgO0iQ== X-Received: by 2002:a05:6214:2b8b:b0:4c7:82c7:8e12 with SMTP id kr11-20020a0562142b8b00b004c782c78e12mr29021586qvb.39.1670947135706; Tue, 13 Dec 2022 07:58:55 -0800 (PST) Received: from localhost ([107.116.82.98]) by smtp.gmail.com with ESMTPSA id bk24-20020a05620a1a1800b006ee77f1ecc3sm8067725qkb.31.2022.12.13.07.58.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Dec 2022 07:58:55 -0800 (PST) Date: Tue, 13 Dec 2022 16:58:50 +0100 From: Johannes Weiner To: Michal Hocko Cc: Mina Almasry , Tejun Heo , Zefan Li , Jonathan Corbet , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Huang Ying , Yang Shi , Yosry Ahmed , weixugc@google.com, fvdl@google.com, bagasdotme@gmail.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v3] mm: Add nodes= arg to memory.reclaim Message-ID: References: <20221202223533.1785418-1-almasrymina@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 13, 2022 at 09:33:24AM +0100, Michal Hocko wrote: > I do recognize your need to control the demotion but I argue that it is > a bad idea to rely on an implicit behavior of the memory reclaim and an > interface which is _documented_ to primarily _reclaim_ memory. I think memory.reclaim should demote as part of page aging. What I'd like to avoid is *having* to manually control the aging component in the interface (e.g. making memory.reclaim *only* reclaim, and *requiring* a coordinated use of memory.demote to ensure progress.) > Really, consider that the current demotion implementation will change > in the future and based on a newly added heuristic memory reclaim or > compression would be preferred over migration to a different tier. This > might completely break your current assumptions and break your usecase > which relies on an implicit demotion behavior. Do you see that as a > potential problem at all? What shall we do in that case? Special case > memory.reclaim behavior? Shouldn't that be derived from the distance propertiers in the tier configuration? I.e. if local compression is faster than demoting to a slower node, we should maybe have a separate tier for that. Ignoring proactive reclaim or demotion commands for a second: on that node, global memory pressure should always compress first, while the oldest pages from the compression cache should demote to the other node(s) - until they eventually get swapped out. However fine-grained we make proactive reclaim control over these stages, it should at least be possible for the user to request the default behavior that global pressure follows, without jumping through hoops or requiring the coordinated use of multiple knobs. So IMO there is an argument for having a singular knob that requests comprehensive aging and reclaiming across the configured hierarchy. As far as explicit control over the individual stages goes - no idea if you would call the compression stage demotion or reclaim. The distinction still does not make much of sense to me, since reclaim is just another form of demotion. Sure, page faults have a different access latency than dax to slower memory. But you could also have 3 tiers of memory where the difference between tier 1 and 2 is much smaller than the difference between 2 and 3, and you might want to apply different demotion rates between them as well. The other argument is that demotion does not free cgroup memory, whereas reclaim does. But with multiple memory tiers of vastly different performance, isn't there also an argument for granting cgroups different shares of each memory? So that a higher priority group has access to a bigger share of the fastest memory, and lower prio cgroups are relegated to lower tiers. If we split those pools, then "demotion" will actually free memory in a cgroup. This is why I liked adding a nodes= argument to memory.reclaim the best. It doesn't encode a distinction that may not last for long. The problem comes from how to interpret the input argument and the return value, right? Could we solve this by requiring the passed nodes= to all be of the same memory tier? Then there is no confusion around what is requested and what the return value means. And if no nodes are passed, it means reclaim (from the lowest memory tier) X pages and demote as needed, then return the reclaimed pages. > Now to your specific usecase. If there is a need to do a memory > distribution balancing then fine but this should be a well defined > interface. E.g. is there a need to not only control demotion but > promotions as well? I haven't heard anybody requesting that so far > but I can easily imagine that like outsourcing the memory reclaim to > the userspace someone might want to do the same thing with the numa > balancing because $REASONS. Should that ever happen, I am pretty sure > hooking into memory.reclaim is not really a great idea. Should this ever happen, it would seem fair that that be a separate knob anyway, no? One knob to move the pipeline in one direction (aging), one knob to move it the other way.