Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp8001595rwb; Wed, 23 Nov 2022 14:00:35 -0800 (PST) X-Google-Smtp-Source: AA0mqf65uic/saKRMNdVvw3jVYdqXaTJXmXLaNLf1DWF+7PqjAtE9SF6t7xgY8HQwhHXwYPZYw5E X-Received: by 2002:a17:902:e04b:b0:186:94b8:5980 with SMTP id x11-20020a170902e04b00b0018694b85980mr11520455plx.113.1669240835244; Wed, 23 Nov 2022 14:00:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669240835; cv=none; d=google.com; s=arc-20160816; b=dZt39Kq9Z52SupaF3SH1EZZ1Zflmj5Nzkr2sLM+p9YbvO21FwdNJgo9NOYlqCbcqyW 86wz7YcpJUW9xeLGH76Yx88iR2l+8ZEShxZ/+Gj9yMkRbKG4hCJ4NhIzLyWRpffd+JY3 o2Cey/rS43ogl7Rie6FKyNS8KRbBAr3K+KWZfbia1poGmehNhpE1r5LlwlpayYPl8yMP drd1HTn+NVEtVafNPfPYwcdj3lRCFr1jnpXQta9HQ6jm5vwxntouZ8wpNm6D3wtXHeSK pzpkONRK6KMnRAUOftyOFAel/5hruBX0xpQ0tzmO+7gbzvHsCSI0BdMvlN22mgth93Xc XSQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=KgB35nu4bsdO3BRjzB71vDc5gogI51cR7qkBlnNkHoA=; b=JdEZeUf3nVJsgTUwVubk0lVLXSXQR2bDgGAkMF6TaYOO+jp2mSxsebQLPjSbBVUD/r X8Y3GKtvIbqJ0hhKcQ5gqxA+xCyVV6Wa8O0MS5W7EJBRi6kw3d590pOZJqwcFPi0YOo/ rQb1zNQVyNdhjvr+Yd5TxnZKJcWTpZJv6guCCrOrDlUsHb5C4iNNURnE1/549/Di9xnY g9qcyDhuJx5X91sEO45cBM0hgxE7LRuB9UN7/IFNYJFhNd8oTH8d05fblKj/ZPKtENhz yynR9a+tEEwMzi2IEB5UJlZC9ujN0jvqj2/nU52fDg8Z4Y0edfLd3JgfRLzghRvwgki8 Vgwg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=RgUxytdJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a186-20020a6390c3000000b0047730e1a823si736255pge.858.2022.11.23.14.00.24; Wed, 23 Nov 2022 14:00:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=RgUxytdJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235724AbiKWVVQ (ORCPT + 88 others); Wed, 23 Nov 2022 16:21:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39132 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237712AbiKWVVN (ORCPT ); Wed, 23 Nov 2022 16:21:13 -0500 Received: from mail-vs1-xe29.google.com (mail-vs1-xe29.google.com [IPv6:2607:f8b0:4864:20::e29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C0816C707 for ; Wed, 23 Nov 2022 13:21:10 -0800 (PST) Received: by mail-vs1-xe29.google.com with SMTP id t5so18697539vsh.8 for ; Wed, 23 Nov 2022 13:21:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=KgB35nu4bsdO3BRjzB71vDc5gogI51cR7qkBlnNkHoA=; b=RgUxytdJdEgfB4d1z2o+FdTqXryjzLvZ4nGRF+rSNbtAiNkj/xWRZCXnEjox3Qnzus ibYwfYLUb6vo6Ch7w2FllZzJR4GyBdOl1wo6nDO89NSiIiWNPsb+9CUw9zNf01MWuTnJ RdDrEnA2Jgkjv+Mfg/0Rnkttl2WaHuZQOqnJh5Dm5AaMts8LSC04yTsJiR/RqXB+TsdE x88zNb+4QJUQ0Rz6veIlgutL2dMskRBO8Ej/+jwwlOOpd781ex+ifKun2SDUVWfsZADZ s2tmYle4/MZIRDRw+fMyC1IAOXboaFxxgm7WPPnqKBTrGl9qzn5XX+EbX5Oi5AaUCwwn Hu0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=KgB35nu4bsdO3BRjzB71vDc5gogI51cR7qkBlnNkHoA=; b=ehlwTxwfq8YWnBotvxkbsvvGeGCRLQQP+tRoZqcNXZcDo8hDO+8AEOPcWyfGQnBHv9 SHZR08ckvRCMum7sBK/a1i/Z62gtVYLlh/Rsq/8zgBTYw1U9WUyqPCJs/CYemyTroCDK IqTkNQYIIHS4GoTZMNVrr5icCxD7P61xwmYBEBAqf6qhQyzNPuP+CohCeiPztJXUP6on LxnY4MX6GjgTaSGiCjANwpjJyNHpSmrSJFedsXmOwHvLq55QVH3BVDTZC4aWI6FCH6We 5dPe6WvES+CzRmS1taOyGr5NREKvXx4KgjbZlT+ltJL+Fn+jaT6WgJaHD9SeEhYVeR8E L83A== X-Gm-Message-State: ANoB5pmBGbBkL8pa4n6GVPhL0mvljySi4foJ0RN50aEmUvqNreHvAQaM uYkpRO6e1SKzQTFnkbR0J08A/Di3joqgq4B+/FQIDg== X-Received: by 2002:a05:6102:54a5:b0:3b0:7462:a88c with SMTP id bk37-20020a05610254a500b003b07462a88cmr1981396vsb.49.1669238469273; Wed, 23 Nov 2022 13:21:09 -0800 (PST) MIME-Version: 1.0 References: <20221122203850.2765015-1-almasrymina@google.com> In-Reply-To: From: Mina Almasry Date: Wed, 23 Nov 2022 13:20:57 -0800 Message-ID: Subject: Re: [RFC PATCH V1] mm: Disable demotion from proactive reclaim To: Johannes Weiner Cc: Huang Ying , Yang Shi , Yosry Ahmed , Tim Chen , weixugc@google.com, shakeelb@google.com, gthelen@google.com, fvdl@google.com, Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 23, 2022 at 10:00 AM Johannes Weiner wrote: > > Hello Mina, > > On Tue, Nov 22, 2022 at 12:38:45PM -0800, Mina Almasry wrote: > > Since commit 3f1509c57b1b ("Revert "mm/vmscan: never demote for memcg > > reclaim""), the proactive reclaim interface memory.reclaim does both > > reclaim and demotion. This is likely fine for us for latency critical > > jobs where we would want to disable proactive reclaim entirely, and is > > also fine for latency tolerant jobs where we would like to both > > proactively reclaim and demote. > > > > However, for some latency tiers in the middle we would like to demote but > > not reclaim. This is because reclaim and demotion incur different latency > > costs to the jobs in the cgroup. Demoted memory would still be addressable > > by the userspace at a higher latency, but reclaimed memory would need to > > incur a pagefault. > > > > To address this, I propose having reclaim-only and demotion-only > > mechanisms in the kernel. There are a couple possible > > interfaces to carry this out I considered: > > > > 1. Disable demotion in the memory.reclaim interface and add a new > > demotion interface (memory.demote). > > 2. Extend memory.reclaim with a "demote=" flag to configure the demotion > > behavior in the kernel like so: > > - demote=0 would disable demotion from this call. > > - demote=1 would allow the kernel to demote if it desires. > > - demote=2 would only demote if possible but not attempt any > > other form of reclaim. > > Unfortunately, our proactive reclaim stack currently relies on > memory.reclaim doing both. It may not stay like that, but I'm a bit > wary of changing user-visible semantics post-facto. > > In patch 2, you're adding a node interface to memory.demote. Can you > add this to memory.reclaim instead? This would allow you to control > demotion and reclaim independently as you please: if you call it on a > node with demotion targets, it will demote; if you call it on a node > without one, it'll reclaim. And current users will remain unaffected. Hello Johannes, thanks for taking a look! I can certainly add the "nodes=" arg to memory.reclaim and you're right, that would help in bridging the gap. However, if I understand the underlying code correctly, with only the nodes= arg the kernel will indeed attempt demotion first, but the kernel will also merrily fall back to reclaiming if it can't demote the full amount. I had hoped to have the flexibility to protect latency sensitive jobs from reclaim entirely while attempting to do demotion. There are probably ways to get around that in the userspace. I presume the userspace can check if there is available memory on the node's demotion targets, and if so, the kernel should demote-only. But I feel that wouldn't be reliable as the demotion logic may change across kernel versions. The userspace may think the kernel would demote but instead demotion failed due to whatever heuristic introduced into the new kernel version. The above is just one angle of the issue. Another angle (which Yosry would care most about I think) is that at Google we call memory.reclaim mainly when memory.current is too close to memory.max and we expect the memory usage of the cgroup to drop as a result of a success memory.reclaim call. I suspect once we take in commit 3f1509c57b1b ("Revert "mm/vmscan: never demote for memcg reclaim""), we would run into that regression, but I defer to Yosry here, he may have a solution for that in mind already. For these reasons, what do you think about adding both the "nodes=" and "demote=" args from my patch series to memory.reclaim? With these args the default memory.reclaim behavior would suit meta as-is (no change) but we would be able to configure it to address our use cases as well. As always, thanks for taking the time to review!