Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp1488414rwb; Thu, 1 Dec 2022 18:51:35 -0800 (PST) X-Google-Smtp-Source: AA0mqf7Se0TxBy8SrhWjzrY4waswtT3pnQ4y9EwXEHQUC7YlALOtnX2PdB07ps7xGqqM/+TdrCgG X-Received: by 2002:a17:906:1844:b0:77a:4bfe:d6df with SMTP id w4-20020a170906184400b0077a4bfed6dfmr59594056eje.396.1669949495152; Thu, 01 Dec 2022 18:51:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669949495; cv=none; d=google.com; s=arc-20160816; b=ui1MynEh2i9eISu2d0mMNVtQzB9lM1Fv0WWKDfaxVTV+u1s759nph8+/rEsGSHUr6v ggcbmVTRwTlhM4XYbkiMIN4QDl95sxV0nLRhkmkgF14CGBmrEipLbiChhr3fItnbCIGj D+EK0Ve8XZTbDrDG8la4x89BOMJLmiIM5u1K4ohitLJJA+A2BvPROmbhRemVpMZAf+9M juT58sddv/7cyItMSV5xMdLWosXW3FuB8C148WjNHLpoccqmj9XcM0XEKmdFpoqxFx2h 7pw0JhbHzF7FL7KtRiRoKHYrstmoowSG8trEh/qMs2M486pAWYKzxZm0a3VV58EKoQUG 5P7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=1N+8R3B0R8scfqdo5hcRRcXMxMacRIFBiFcRxzf5Jso=; b=Jj4MRZd+5/NTPCQpkJZ9PII4tnQNGcXein+XdEE2sV4G5tdrByM/3F1aKjAFvqkOYe gmF0aM29mgBHOX+CShLfbuUixu9WRq48GwyuiImPfl9L//PuNIK/omlQMy/Qz3TZyklh JgV538cVhHd7eHceeh1dELWV4NmQEC0F/Rpe2IQau3TzzMiVWVlUzxUz03d1TexUBS5k xzX0Rt8GTEJR2Mvi7pcq3bwxXFabqnR9jqzoBNm6SqoOYnLfx2/bRenGfvdFLmOe72tq /TVTtbJ4oQ2qZuZsVXXrktCM/NQ3hHLxfOcvOWBCVlNkWw+zst/i8LBvbP/gxjttCVgR dbvw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=FGvlzpkl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hb44-20020a170907162c00b00757bd7f53dcsi5033597ejc.14.2022.12.01.18.51.14; Thu, 01 Dec 2022 18:51:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=FGvlzpkl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231843AbiLBCGj (ORCPT + 81 others); Thu, 1 Dec 2022 21:06:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55542 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231402AbiLBCGd (ORCPT ); Thu, 1 Dec 2022 21:06:33 -0500 Received: from mail-vs1-xe33.google.com (mail-vs1-xe33.google.com [IPv6:2607:f8b0:4864:20::e33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87BFAD4AD2 for ; Thu, 1 Dec 2022 18:06:31 -0800 (PST) Received: by mail-vs1-xe33.google.com with SMTP id 3so2551836vsq.7 for ; Thu, 01 Dec 2022 18:06:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=1N+8R3B0R8scfqdo5hcRRcXMxMacRIFBiFcRxzf5Jso=; b=FGvlzpklux1eQvCCAhLE2Dtsmq6kxEJYMyrDPhWhi4Ffz08/w1SdCuiWgYdyoClJrY a0R+QzFLk3lpRqYN6QRoKAZl3xnbErPu1BCj04OFhGMBvJ7ekKm6VjmUMppeSXtW/nh9 APmORNQZQH1gsBgsfLGMdXY4tCY/CITBgZjmetWkAQOJ98MvFniyGzZ9jCfv29rmJbym GeG4mcozIeZ6jOER3fKuVByTFou9p4ezpVgtmMEfyoB+nyhDGjXA6MifkrozZS75L1D5 vzBn/PAYaRr5383y41JHtPWyAYdDU+3I8ThF3FCuAqrene7IRm3XyJb6NWsEhvGhg426 jxLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1N+8R3B0R8scfqdo5hcRRcXMxMacRIFBiFcRxzf5Jso=; b=EzPnJ3tur0oRwgbIEc9cF6ov6WRr3d04THY2lC6momr8n+HieuKGIuXBx6GDmE4WWA gSuf74onKPke7c2hmzftdaNSRD8GFlTLhxDKaBvOp8JwZq4iC4EG2OG9NOueoisj0xuG EwtPZ4ULPz7UTAsWZmPktOYhSrNExoMUlOTyl7JHFlO0Elw9OpdZE+H9YoxYiHY8byFq oEtmvc2mQyFa9BLqZnNCj7ZtnBqsyMBraprFiOIQZKAFUq4RktOnLsuEE9sxfi6b+D8P 1DivA2IG+15Olwx6r5utxID5dSrPuIE4ao8+yjiK38z77P/3wOBrl7fv9Sd2sh25if0P ZcPA== X-Gm-Message-State: ANoB5plSypk36PX4Ux1h8SOhsGw90Bm5Z2K8W3YEp/FDF2KDgByHNt4W NN4KI3lAezem+Ofna8o133Q6SyxZiw0e3m3RKF63Kw== X-Received: by 2002:a05:6102:1c5:b0:3b0:4f9b:238d with SMTP id s5-20020a05610201c500b003b04f9b238dmr28776025vsq.59.1669946790569; Thu, 01 Dec 2022 18:06:30 -0800 (PST) MIME-Version: 1.0 References: <20221122203850.2765015-1-almasrymina@google.com> <874juonbmv.fsf@yhuang6-desk2.ccr.corp.intel.com> <87wn7dayfz.fsf@yhuang6-desk2.ccr.corp.intel.com> <87tu2e36nw.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87tu2e36nw.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Mina Almasry Date: Thu, 1 Dec 2022 18:06:18 -0800 Message-ID: Subject: Re: [RFC PATCH V1] mm: Disable demotion from proactive reclaim To: "Huang, Ying" Cc: Johannes Weiner , Yang Shi , Yosry Ahmed , Tim Chen , weixugc@google.com, shakeelb@google.com, gthelen@google.com, fvdl@google.com, Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 1, 2022 at 6:02 PM Huang, Ying wrote: > > Mina Almasry writes: > > > On Tue, Nov 29, 2022 at 7:56 PM Huang, Ying wrote: > >> > >> Johannes Weiner writes: > >> > >> > Hello Ying, > >> > > >> > On Thu, Nov 24, 2022 at 01:51:20PM +0800, Huang, Ying wrote: > >> >> Johannes Weiner writes: > >> >> > The fallback to reclaim actually strikes me as wrong. > >> >> > > >> >> > Think of reclaim as 'demoting' the pages to the storage tier. If we > >> >> > have a RAM -> CXL -> storage hierarchy, we should demote from RAM to > >> >> > CXL and from CXL to storage. If we reclaim a page from RAM, it means > >> >> > we 'demote' it directly from RAM to storage, bypassing potentially a > >> >> > huge amount of pages colder than it in CXL. That doesn't seem right. > >> >> > > >> >> > If demotion fails, IMO it shouldn't satisfy the reclaim request by > >> >> > breaking the layering. Rather it should deflect that pressure to the > >> >> > lower layers to make room. This makes sure we maintain an aging > >> >> > pipeline that honors the memory tier hierarchy. > >> >> > >> >> Yes. I think that we should avoid to fall back to reclaim as much as > >> >> possible too. Now, when we allocate memory for demotion > >> >> (alloc_demote_page()), __GFP_KSWAPD_RECLAIM is used. So, we will trigger > >> >> kswapd reclaim on lower tier node to free some memory to avoid fall back > >> >> to reclaim on current (higher tier) node. This may be not good enough, > >> >> for example, the following patch from Hasan may help via waking up > >> >> kswapd earlier. > >> >> > >> >> https://lore.kernel.org/linux-mm/b45b9bf7cd3e21bca61d82dcd1eb692cd32c122c.1637778851.git.hasanalmaruf@fb.com/ > >> >> > >> >> Do you know what is the next step plan for this patch? > >> >> > >> >> Should we do even more? > >> >> > >> >> From another point of view, I still think that we can use falling back > >> >> to reclaim as the last resort to avoid OOM in some special situations, > >> >> for example, most pages in the lowest tier node are mlock() or too hot > >> >> to be reclaimed. > >> > > >> > If they're hotter than reclaim candidates on the toptier, shouldn't > >> > they get promoted instead and make room that way? We may have to tweak > >> > the watermark logic a bit to facilitate that (allow promotions where > >> > regular allocations already fail?). But this sort of resorting would > >> > be preferable to age inversions. > >> > >> Now it's legal to enable demotion and disable promotion. Yes, this is > >> wrong configuration in general. But should we trigger OOM for these > >> users? > >> > >> And now promotion only works for default NUMA policy (and MPOL_BIND to > >> both promotion source and target nodes with MPOL_F_NUMA_BALANCING). If > >> we use some other NUMA policy, the pages cannot be promoted too. > >> > >> > The mlock scenario sounds possible. In that case, it wouldn't be an > >> > aging inversion, since there is nothing colder on the CXL node. > >> > > >> > Maybe a bypass check should explicitly consult the demotion target > >> > watermarks against its evictable pages (similar to the file_is_tiny > >> > check in prepare_scan_count)? > >> > >> Yes. This sounds doable. > >> > >> > Because in any other scenario, if there is a bug in the promo/demo > >> > coordination, I think we'd rather have the OOM than deal with age > >> > inversions causing intermittent performance issues that are incredibly > >> > hard to track down. > >> > >> Previously, I thought that people will always prefer performance > >> regression than OOM. Apparently, I am wrong. > >> > >> Anyway, I think that we need to reduce the possibility of OOM or falling > >> back to reclaim as much as possible firstly. Do you agree? > >> > > > > I've been discussing this with a few folks here. I think FWIW general > > feeling here is that demoting from top tier nodes is preferred, except > > in extreme circumstances we would indeed like to run with a > > performance issue rather than OOM a customer VM. I wonder if there is > > another way to debug mis-tiered pages rather than trigger an oom to > > debug. > > > > One thing I think/hope we can trivially agree on is that proactive > > reclaim/demotion is _not_ an extreme circumstance. I would like me or > > someone from the team to follow up with a patch that disables fallback > > to reclaim on proactive reclaim/demotion (sc->proactive). > > Yes. This makes sense to me. > Glad to hear it. Patch is already sent for review btw: https://lore.kernel.org/linux-mm/20221201233317.1394958-1-almasrymina@google.com/T/ > Best Regards, > Huang, Ying > > >> One possibility, can we fall back to reclaim only if the sc->priority is > >> small enough (even 0)? > >> > > > > This makes sense to me. > > > >> Best Regards, > >> Huang, Ying > >> >