Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp15670658rwb; Mon, 28 Nov 2022 14:42:55 -0800 (PST) X-Google-Smtp-Source: AA0mqf5zjjHCEdCGxaTHUlXD6w1Hoo1r7iSPQM6j3V4EGBcJPvVZv8RsncPs16ax7DuGxF2ShEeE X-Received: by 2002:a17:902:7089:b0:187:4ace:e20c with SMTP id z9-20020a170902708900b001874acee20cmr34769736plk.75.1669675374807; Mon, 28 Nov 2022 14:42:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669675374; cv=none; d=google.com; s=arc-20160816; b=RN/iwhP9rj/NeMKZJcBEToxK4RbVa113bDMUakrJ7nH6D07kfgxBrpqIVL5/SHlJjR i4shMro2XY9TVEw6qf6rWOw2i1eigncW5eMYFfI/sUnxnx1hywM22+kXXPUqmSgpkaTW zqwF2tVyZLUQK8zlOJ9X1BQ97chp2mIETTp06BT3v+s4ubomunWJuP4F69PRrpxBh03Y /Tg/q1vp5U8Pu9oUaoC0XyqHv38tRcnCKc9/ajGhcmebyIuBsSLmLQlFq/0kyJu4ER4X fjlKbgMpdDlI7oHa5M1e1EhoJA/040HF8rdyP6GQ47SBmQuc1Wwb0IOsGxMJFTOmK0Ax oKSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=ChSR6Hj3C9VOEQNkpeQ6mJNLM6aWG/Pf82PGUg9D7GM=; b=pOIkWPHoCYWQWjnjG8TqPC/CTbRxtj2Nk5nVBy8wieKpcFYW0T27bJg/hQ28yhSPIQ HCZ6Op5IfmDrFLf4oHgSpYsoTRmylagpD+ECVZNMBPbx39uO5owJWEOlK5EtSUX1XL2T odkpwgu5m7ss1ZtVIykMIMrI3W4BHnfJM7+YEljYzEFIdWnGAV3veDbd7fwWvopJVpHb 8j/WBEeZgzFcLuOL9K8di0BoclrQZgYwuyb+DMxT0erRzDzk09XE54HjMCGBRjyz3hbU 0H1gT7Lu7KolywXQHYl/htYswFswTj0j61MbHmKMioGxYVZYqYRyYpj7CAjxpfAhd+Ba +msw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=TJ3UVG5h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i7-20020a1709026ac700b00174b83659fasi12656204plt.502.2022.11.28.14.42.44; Mon, 28 Nov 2022 14:42:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=TJ3UVG5h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234249AbiK1WYS (ORCPT + 83 others); Mon, 28 Nov 2022 17:24:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232825AbiK1WYQ (ORCPT ); Mon, 28 Nov 2022 17:24:16 -0500 Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F0DF010FC6; Mon, 28 Nov 2022 14:24:14 -0800 (PST) Received: by mail-pl1-x632.google.com with SMTP id p24so7852888plw.1; Mon, 28 Nov 2022 14:24:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ChSR6Hj3C9VOEQNkpeQ6mJNLM6aWG/Pf82PGUg9D7GM=; b=TJ3UVG5hY7eNPG/NI1+Ofd6ZgVBVNg0o4s4HC8hmb56J7fnfnYX6byUPJntUVjbkvM mbnSLnfDDAqp0j6WdmpnSxgAZER30CcMUE01FP/WqAi251v0v93rP+fWvJvDm1c0MKLf JACq0lvFkIkdQnDeDjDdZ2QkSGcxld+nGK8y/E8+E3LId+HRRBJ/7BbGsIy0cCBmz+RI 4RNo00M5vyg1Z3X/UB3Y4lcHplxj7jE5PXU6vu3SMsz6h+rBWLgPlwGljviih4aLPeip 6V+bTBVJuLFBKr3gwSSxNuDqLfg3IqOSLOd+h99tcqHcWw0UyQy3YdrM97RCyGidEd/i uSvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ChSR6Hj3C9VOEQNkpeQ6mJNLM6aWG/Pf82PGUg9D7GM=; b=pdY6Lce/CAxpJbUwgsOJ+EhfdbpfkeVMWtfueGD8YKST6Naz/ibZpkkn78fF6SZZet V3MN81bbeOlxc5Wjjp5igdmIN8cGx9PLJV5XeDv2ou6mcWsf93BR8iwHWs6RJrH+iAjn r2OSOIOx1ExYjSurafjrSuX9+wwRTPkYHgzoN3a9r9+ElOIqk0dlQ7fEAjeAje6jDXL0 sez8Pd9cfWfCiHDA1J4qWxoApnD5uId83qnrKq4Xv8DDP1JoFEv82G2YcgczSCF5Na4V Muk/fSWl8dRbtMo5SV9Lwiwumz0fAj9JiZJm/sYglO3NSjUI3HyE5nZikqwbDbza0taQ WWqg== X-Gm-Message-State: ANoB5pmlHgcNJNYolh1sJUP6B36Dt7Hog2ppqgS64HAF46Gaay7iif0t lkPM0H7PkhKlKUM5Ib/sLhWD2h6jd0rV+85Ho3w= X-Received: by 2002:a17:903:22c4:b0:184:cb7e:ba36 with SMTP id y4-20020a17090322c400b00184cb7eba36mr33770698plg.57.1669674254391; Mon, 28 Nov 2022 14:24:14 -0800 (PST) MIME-Version: 1.0 References: <20221122203850.2765015-1-almasrymina@google.com> <874juonbmv.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <874juonbmv.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Yang Shi Date: Mon, 28 Nov 2022 14:24:03 -0800 Message-ID: Subject: Re: [RFC PATCH V1] mm: Disable demotion from proactive reclaim To: "Huang, Ying" Cc: Johannes Weiner , Mina Almasry , Yang Shi , Yosry Ahmed , Tim Chen , weixugc@google.com, shakeelb@google.com, gthelen@google.com, fvdl@google.com, Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 23, 2022 at 9:52 PM Huang, Ying wrote: > > Hi, Johannes, > > Johannes Weiner writes: > [...] > > > > The fallback to reclaim actually strikes me as wrong. > > > > Think of reclaim as 'demoting' the pages to the storage tier. If we > > have a RAM -> CXL -> storage hierarchy, we should demote from RAM to > > CXL and from CXL to storage. If we reclaim a page from RAM, it means > > we 'demote' it directly from RAM to storage, bypassing potentially a > > huge amount of pages colder than it in CXL. That doesn't seem right. > > > > If demotion fails, IMO it shouldn't satisfy the reclaim request by > > breaking the layering. Rather it should deflect that pressure to the > > lower layers to make room. This makes sure we maintain an aging > > pipeline that honors the memory tier hierarchy. > > Yes. I think that we should avoid to fall back to reclaim as much as > possible too. Now, when we allocate memory for demotion > (alloc_demote_page()), __GFP_KSWAPD_RECLAIM is used. So, we will trigger > kswapd reclaim on lower tier node to free some memory to avoid fall back > to reclaim on current (higher tier) node. This may be not good enough, > for example, the following patch from Hasan may help via waking up > kswapd earlier. For the ideal case, I do agree with Johannes to demote the page tier by tier rather than reclaiming them from the higher tiers. But I also agree with your premature OOM concern. > > https://lore.kernel.org/linux-mm/b45b9bf7cd3e21bca61d82dcd1eb692cd32c122c.1637778851.git.hasanalmaruf@fb.com/ > > Do you know what is the next step plan for this patch? > > Should we do even more? In my initial implementation I implemented a simple throttle logic when the demotion is not going to succeed if the demotion target has not enough free memory (just check the watermark) to make migration succeed without doing any reclamation. Shall we resurrect that? Waking kswapd sooner is fine to me, but it may be not enough, for example, the kswapd may not keep up so remature OOM may happen on higher tiers or reclaim may still happen. I think throttling the reclaimer/demoter until kswapd makes progress could avoid both. And since the lower tiers memory typically is quite larger than the higher tiers, so the throttle should happen very rarely IMHO. > > From another point of view, I still think that we can use falling back > to reclaim as the last resort to avoid OOM in some special situations, > for example, most pages in the lowest tier node are mlock() or too hot > to be reclaimed. > > > So I'm hesitant to design cgroup controls around the current behavior. > > > > Best Regards, > Huang, Ying >