Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp1219680rwb; Tue, 29 Nov 2022 10:25:10 -0800 (PST) X-Google-Smtp-Source: AA0mqf5sy1VKTmZFHGJtdjqrLPzy9o7TiiWQRRXIYa2APrt6FB6CPj04j1rBqmV53y9eiA4yplCK X-Received: by 2002:a63:f056:0:b0:477:a33f:4858 with SMTP id s22-20020a63f056000000b00477a33f4858mr33072989pgj.76.1669746310502; Tue, 29 Nov 2022 10:25:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669746310; cv=none; d=google.com; s=arc-20160816; b=wpnJ/B+AZDSfVSwmhBUS0oQuLsXtgW8W02UYVPPxWzQ0k4cZ7I4GEV3w3JlLkVn+Mo wgHQRrYow/+dqtA9o/lPn+ZA03ab8VFlSTbh1kIi9XnWqEoXILGJU+Z7BQLykhhKm2Jo ttbJiQBrllQT2ijejiLoxUJ2YzmOJvPlPuVRUdpW3yh3AyTT1wj7SA6VJKe5NnpALKo+ LGZGYrgGkg+0kzysRPYvSACjnIz69DF2dim2q/HLqI/zHHDSI1hig8BFJvMS2Vz9BOD3 AJr1gwxBBLRuz/UAJ9v50TH3K6b2WB5AsnJuE68RgJniHRb1WPSXgRB32SRNFUqROETj Lf3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=uLGeaz8NT3BELQqUMecnx7omQ6ldn4HqO+/oulNlF3g=; b=XhCCRgrejCbhSWjCo3nmIpUEzK82CIVYiqVwj2UiWSc6xrgVlz6NddgvEHyykg/hMv xVM/2wUGoHVJ8ZibHx2yA7my3emgWQNvR6+iZCo489JTSvbJjruHTh2fTqAFDJrOpgVn lOK3qi3glmQGs/JKFv+MaKpE7jsWbUKsKA3bRrJjintVfXxwVgeeDwYyHcmClrIrN1Kl MukZPjtLsyhzGDFvgxFA+JUIgBhrdTiaFDTnF7G1+sGcE7I9OKlRE71ssneDYM7RGgLa agRBXgipBnhUedBv7x7dLPl5tBX3lOLC63D1EecHyrrlnTxe7ps/IKuKgn+Ie7vilRFn o3GQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=iJplMCRi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y9-20020a17090ad70900b00211f23d6f49si2122211pju.15.2022.11.29.10.24.58; Tue, 29 Nov 2022 10:25:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=iJplMCRi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234059AbiK2SIQ (ORCPT + 84 others); Tue, 29 Nov 2022 13:08:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32968 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232360AbiK2SIG (ORCPT ); Tue, 29 Nov 2022 13:08:06 -0500 Received: from mail-qv1-xf33.google.com (mail-qv1-xf33.google.com [IPv6:2607:f8b0:4864:20::f33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE8CC1F9E3 for ; Tue, 29 Nov 2022 10:08:03 -0800 (PST) Received: by mail-qv1-xf33.google.com with SMTP id e18so8915675qvs.1 for ; Tue, 29 Nov 2022 10:08:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=uLGeaz8NT3BELQqUMecnx7omQ6ldn4HqO+/oulNlF3g=; b=iJplMCRi2LhhQadE374lqFmED+tpdcxnO/27Tz7GhsxsGYaov3AHSLgVredAl4EKwf y8y410DKLDpSpT3Vg/nc8WJi8/kmTtcmzjrB7+f6ia7nIR8Fx3ZTXcjIUFvPzTGKcA7N sWqZP2O1KSGzmGfWdZulbDtgFgQt+fgC6FqTMrTCWJYjAcIvClChZJLfa3V7OLx7EIiD 3/UMSLouRXc6/u20V8fBpyyQyw4VpjI0d+y3dKINM0DN3bj8C8bVcVVlf7k657Y7eC0l C1PNjG/V/Im5mFkoETZImcdpJ9/+aiX7oXo7pgjfoqk1pwugErWo42C1OW9+7z/62bmZ EwPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=uLGeaz8NT3BELQqUMecnx7omQ6ldn4HqO+/oulNlF3g=; b=ugFYaC+0LVHDw4HlYOjLw6LqXLxnaJtP6OFWaXsv2AZ2awMMSsdRX/CgwWul8whJ0a sq2n4SOADrAkwWgBU0V03DnOJgXDFLW84wL9Du+hezgtGXs+RkpvwnXX2KyUu88eBQMq D4UsSgTpxyhtGJ1kZxHZRxMCyvV/sFOTTBR/puh/8jifwgDiCCWIvR4oziACmZxyH9Fq XPQHk8Y9j4EjO/43Eo7gY5w0GoDomQCkKdaSVH0vNHcRcRUtl1gYPNmhW6092Se9OHSO 92bZrvSfBMsMaEuqDWNvq6MpZU6L/qqItcrhV4mVkWbB+9JILdmDACSmXIb06HjLOF7D VrFw== X-Gm-Message-State: ANoB5pkzh3mzGE88mcCQ/pc6hMCDXdf72Ksvkq7d6pSU8d7Cn16HtLx5 G7rBHxNaW3Ue8c0/iE3uY7Yt/A== X-Received: by 2002:a0c:90f1:0:b0:4c6:8f2e:9a2 with SMTP id p104-20020a0c90f1000000b004c68f2e09a2mr52264929qvp.100.1669745282935; Tue, 29 Nov 2022 10:08:02 -0800 (PST) Received: from localhost ([2620:10d:c091:480::1:ea9a]) by smtp.gmail.com with ESMTPSA id f3-20020a05622a114300b003434d3b5938sm8911215qty.2.2022.11.29.10.08.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Nov 2022 10:08:02 -0800 (PST) Date: Tue, 29 Nov 2022 13:08:01 -0500 From: Johannes Weiner To: "Huang, Ying" Cc: Mina Almasry , Yang Shi , Yosry Ahmed , Tim Chen , weixugc@google.com, shakeelb@google.com, gthelen@google.com, fvdl@google.com, Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC PATCH V1] mm: Disable demotion from proactive reclaim Message-ID: References: <20221122203850.2765015-1-almasrymina@google.com> <874juonbmv.fsf@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <874juonbmv.fsf@yhuang6-desk2.ccr.corp.intel.com> X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Ying, On Thu, Nov 24, 2022 at 01:51:20PM +0800, Huang, Ying wrote: > Johannes Weiner writes: > > The fallback to reclaim actually strikes me as wrong. > > > > Think of reclaim as 'demoting' the pages to the storage tier. If we > > have a RAM -> CXL -> storage hierarchy, we should demote from RAM to > > CXL and from CXL to storage. If we reclaim a page from RAM, it means > > we 'demote' it directly from RAM to storage, bypassing potentially a > > huge amount of pages colder than it in CXL. That doesn't seem right. > > > > If demotion fails, IMO it shouldn't satisfy the reclaim request by > > breaking the layering. Rather it should deflect that pressure to the > > lower layers to make room. This makes sure we maintain an aging > > pipeline that honors the memory tier hierarchy. > > Yes. I think that we should avoid to fall back to reclaim as much as > possible too. Now, when we allocate memory for demotion > (alloc_demote_page()), __GFP_KSWAPD_RECLAIM is used. So, we will trigger > kswapd reclaim on lower tier node to free some memory to avoid fall back > to reclaim on current (higher tier) node. This may be not good enough, > for example, the following patch from Hasan may help via waking up > kswapd earlier. > > https://lore.kernel.org/linux-mm/b45b9bf7cd3e21bca61d82dcd1eb692cd32c122c.1637778851.git.hasanalmaruf@fb.com/ > > Do you know what is the next step plan for this patch? > > Should we do even more? > > From another point of view, I still think that we can use falling back > to reclaim as the last resort to avoid OOM in some special situations, > for example, most pages in the lowest tier node are mlock() or too hot > to be reclaimed. If they're hotter than reclaim candidates on the toptier, shouldn't they get promoted instead and make room that way? We may have to tweak the watermark logic a bit to facilitate that (allow promotions where regular allocations already fail?). But this sort of resorting would be preferable to age inversions. The mlock scenario sounds possible. In that case, it wouldn't be an aging inversion, since there is nothing colder on the CXL node. Maybe a bypass check should explicitly consult the demotion target watermarks against its evictable pages (similar to the file_is_tiny check in prepare_scan_count)? Because in any other scenario, if there is a bug in the promo/demo coordination, I think we'd rather have the OOM than deal with age inversions causing intermittent performance issues that are incredibly hard to track down.