Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2163702pxb; Mon, 8 Mar 2021 16:27:28 -0800 (PST) X-Google-Smtp-Source: ABdhPJypOyZzBS/TL2Gc6KLS/Pb/nSKFgx+sPNWhWFuWw6aIzAcYx6W3EZqGH6FKsEenE0nFKpIH X-Received: by 2002:a50:fa92:: with SMTP id w18mr1143444edr.172.1615249648053; Mon, 08 Mar 2021 16:27:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615249648; cv=none; d=google.com; s=arc-20160816; b=DTzrnSsdfjaRtW1kKsFuAmTpotQcR61knkQOoNhQaaFAFy/u/nDAWbkl5Dio19Sz+A nIDeQ1w9oIyUAqSJZs8Ai6wlGB2bc7vbFpBNFJ7xkIlFQsPKMICUbEdRAmu5s2HfBBII hnSDs0TrJG+9VOSt5DQw0twvvSWE4ah+dJlCW77rLT+yQ8ZZ9WwY37T4isupfdbRNOl+ giALRA7Fj7Kpu+JU9m7SURztODZnCp6RY1s9Jy4MVO2afhG+wYVpI49VTXoDjSXFk0Xc BHzOHP4TxnwUugMq3QE79dVQlZFZA9XgaJH/OGEBN2/oMAUAwFs7td2I21QBYjWtASlS uU6w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=3ldMzFuy7XtYOaxXkLzzP1hP70OgEV2wYJ6wrQAzlas=; b=v6P82Pq9eIkcKARKOly0L9O58/bB6cNBb//VD4keFrOCeNzb9wPEN0LVAnTdGdz1yd AGJ+wJ2h41g7hxU815yRBA1GlZWVfl17Nj9cAaLsmsPvoxGgD1e33vJoQFPPt/GdwxdQ IxPtQsXGl4Hfexb11Qc1QkAWb72zYlKXx64j3H4qGqsYKiwXIf2AMDbDamozS60vAamF TiKirSKz0ibtiBdk3Y5MfDIzu5rxRsYQ9+W1WuaOXnf0CM6/hDWGCReQAXlQSQZRyisG K+0fP6MCPmcfSwdOIRrJFe1X31DSBwqZCbp5gZgl0jMAHr8QEokdJkoD5ZOS074jMabl 0hgw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=jeNcCOlv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h15si3730168edb.566.2021.03.08.16.27.05; Mon, 08 Mar 2021 16:27:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=jeNcCOlv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231246AbhCIAZA (ORCPT + 99 others); Mon, 8 Mar 2021 19:25:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45120 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229471AbhCIAYf (ORCPT ); Mon, 8 Mar 2021 19:24:35 -0500 Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com [IPv6:2a00:1450:4864:20::52e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B1670C06174A for ; Mon, 8 Mar 2021 16:24:34 -0800 (PST) Received: by mail-ed1-x52e.google.com with SMTP id h10so17452578edl.6 for ; Mon, 08 Mar 2021 16:24:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=3ldMzFuy7XtYOaxXkLzzP1hP70OgEV2wYJ6wrQAzlas=; b=jeNcCOlvTrbIsMXCB9B6QVX6lMNpuitbEDi0X04q8hWbGrev263PWPr2QoiA8k71ch 5OVBnlQs+aS6S8p0MgmX1o6RLLvefrRBoBXH7kLWLyG6lckBVa2BWQmRRArDVA59KPgf ZOodP4NxXnzzcJ1rNtplBARmQR4SNumGLYolR9f4g0GS2cgh9KjaLEmcQWtfRd+djJzY 2A/O8g01KBqgXEhuAu08W1AoLP3WHXYIvQZJ8sfuBJR7IF31upKtwwH4LKrygW+k9x9J GPYky+eN+oR1Qb/zJJITFPZzAHljYbye+E4oe1NrBWfG4XkEyJZx53UZrY3T6wV0u6S/ WtcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=3ldMzFuy7XtYOaxXkLzzP1hP70OgEV2wYJ6wrQAzlas=; b=NCDGj2HXx+KDfbIzMb5iFz0lp9t1kn1ouu3TxrqWCEHSUBU0Yl/Ys5umQTOI20hPIv qd4eOpPA0SjFJUO7KAs70ShK8DNczmWFYOTdAjHeBk+FuIgZiwx5vbiA2GA9ADajhWI5 PWzg/UQq+zUEsXEWft0d8KUfA7fofxZ2fH4D0ZdcChNsfYns4WwX4P99FQ/SZte4Ebi3 AO7cgDUTLFXWImY/P9Xk7JS1FZOp2vuUOJu4glpCC9AiYjKuLn5dSaRlMmsf+xnWdOwm v6k1wpexNv5Wdim591mD+Rr1NEbGuv3rSBA9qUGsUM+95kRYX3sZdLjc2S91dHnyXdVd bn5g== X-Gm-Message-State: AOAM531GPmMzbi+z41eed9vlg9qdDoS5PvdV75ox+NCtxsNCPjEsl24N 7mb4gOzUR+mwt7l7VWD/BAUzMu/yfmmALjFIUfc= X-Received: by 2002:aa7:cc03:: with SMTP id q3mr1139714edt.366.1615249473522; Mon, 08 Mar 2021 16:24:33 -0800 (PST) MIME-Version: 1.0 References: <20210304235949.7922C1C3@viggo.jf.intel.com> <20210305000009.EDF902E9@viggo.jf.intel.com> In-Reply-To: <20210305000009.EDF902E9@viggo.jf.intel.com> From: Yang Shi Date: Mon, 8 Mar 2021 16:24:22 -0800 Message-ID: Subject: Re: [PATCH 10/10] mm/migrate: new zone_reclaim_mode to enable reclaim migration To: Dave Hansen Cc: Linux Kernel Mailing List , Linux MM , Yang Shi , David Rientjes , Huang Ying , Dan Williams , David Hildenbrand , Oscar Salvador Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 4, 2021 at 4:01 PM Dave Hansen wrote: > > > From: Dave Hansen > > Some method is obviously needed to enable reclaim-based migration. > > Just like traditional autonuma, there will be some workloads that > will benefit like workloads with more "static" configurations where > hot pages stay hot and cold pages stay cold. If pages come and go > from the hot and cold sets, the benefits of this approach will be > more limited. > > The benefits are truly workload-based and *not* hardware-based. > We do not believe that there is a viable threshold where certain > hardware configurations should have this mechanism enabled while > others do not. > > To be conservative, earlier work defaulted to disable reclaim- > based migration and did not include a mechanism to enable it. > This proposes extending the existing "zone_reclaim_mode" (now > now really node_reclaim_mode) as a method to enable it. > > We are open to any alternative that allows end users to enable > this mechanism or disable it it workload harm is detected (just > like traditional autonuma). > > Once this is enabled page demotion may move data to a NUMA node > that does not fall into the cpuset of the allocating process. > This could be construed to violate the guarantees of cpusets. > However, since this is an opt-in mechanism, the assumption is > that anyone enabling it is content to relax the guarantees. I think we'd better have the cpuset violation paragraph along with new zone reclaim mode text so that the users are aware of the potential violation. I don't think commit log is the to-go place for any plain users. > > Signed-off-by: Dave Hansen > Cc: Yang Shi > Cc: David Rientjes > Cc: Huang Ying > Cc: Dan Williams > Cc: David Hildenbrand > Cc: osalvador > > changes since 20200122: > * Changelog material about relaxing cpuset constraints > --- > > b/Documentation/admin-guide/sysctl/vm.rst | 9 +++++++++ > b/include/linux/swap.h | 3 ++- > b/include/uapi/linux/mempolicy.h | 1 + > b/mm/vmscan.c | 6 ++++-- > 4 files changed, 16 insertions(+), 3 deletions(-) > > diff -puN Documentation/admin-guide/sysctl/vm.rst~RECLAIM_MIGRATE Documentation/admin-guide/sysctl/vm.rst > --- a/Documentation/admin-guide/sysctl/vm.rst~RECLAIM_MIGRATE 2021-03-04 15:36:26.078806355 -0800 > +++ b/Documentation/admin-guide/sysctl/vm.rst 2021-03-04 15:36:26.093806355 -0800 > @@ -976,6 +976,7 @@ This is value OR'ed together of > 1 Zone reclaim on > 2 Zone reclaim writes dirty pages out > 4 Zone reclaim swaps pages > +8 Zone reclaim migrates pages > = =================================== > > zone_reclaim_mode is disabled by default. For file servers or workloads > @@ -1000,3 +1001,11 @@ of other processes running on other node > Allowing regular swap effectively restricts allocations to the local > node unless explicitly overridden by memory policies or cpuset > configurations. > + > +Page migration during reclaim is intended for systems with tiered memory > +configurations. These systems have multiple types of memory with varied > +performance characteristics instead of plain NUMA systems where the same > +kind of memory is found at varied distances. Allowing page migration > +during reclaim enables these systems to migrate pages from fast tiers to > +slow tiers when the fast tier is under pressure. This migration is > +performed before swap. > diff -puN include/linux/swap.h~RECLAIM_MIGRATE include/linux/swap.h > --- a/include/linux/swap.h~RECLAIM_MIGRATE 2021-03-04 15:36:26.082806355 -0800 > +++ b/include/linux/swap.h 2021-03-04 15:36:26.093806355 -0800 > @@ -382,7 +382,8 @@ extern int sysctl_min_slab_ratio; > static inline bool node_reclaim_enabled(void) > { > /* Is any node_reclaim_mode bit set? */ > - return node_reclaim_mode & (RECLAIM_ZONE|RECLAIM_WRITE|RECLAIM_UNMAP); > + return node_reclaim_mode & (RECLAIM_ZONE |RECLAIM_WRITE| > + RECLAIM_UNMAP|RECLAIM_MIGRATE); > } > > extern void check_move_unevictable_pages(struct pagevec *pvec); > diff -puN include/uapi/linux/mempolicy.h~RECLAIM_MIGRATE include/uapi/linux/mempolicy.h > --- a/include/uapi/linux/mempolicy.h~RECLAIM_MIGRATE 2021-03-04 15:36:26.084806355 -0800 > +++ b/include/uapi/linux/mempolicy.h 2021-03-04 15:36:26.094806355 -0800 > @@ -69,5 +69,6 @@ enum { > #define RECLAIM_ZONE (1<<0) /* Run shrink_inactive_list on the zone */ > #define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */ > #define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */ > +#define RECLAIM_MIGRATE (1<<3) /* Migrate to other nodes during reclaim */ > > #endif /* _UAPI_LINUX_MEMPOLICY_H */ > diff -puN mm/vmscan.c~RECLAIM_MIGRATE mm/vmscan.c > --- a/mm/vmscan.c~RECLAIM_MIGRATE 2021-03-04 15:36:26.087806355 -0800 > +++ b/mm/vmscan.c 2021-03-04 15:36:26.096806355 -0800 > @@ -1073,6 +1073,9 @@ static bool migrate_demote_page_ok(struc > VM_BUG_ON_PAGE(PageHuge(page), page); > VM_BUG_ON_PAGE(PageLRU(page), page); > > + if (!(node_reclaim_mode & RECLAIM_MIGRATE)) > + return false; > + > /* It is pointless to do demotion in memcg reclaim */ > if (cgroup_reclaim(sc)) > return false; > @@ -1082,8 +1085,7 @@ static bool migrate_demote_page_ok(struc > if (PageTransHuge(page) && !thp_migration_supported()) > return false; > > - // FIXME: actually enable this later in the series > - return false; > + return true; > } > > /* Check if a page is dirty or under writeback */ > _ >