Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp3312242pxa; Tue, 18 Aug 2020 11:52:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwKGer2gAJCQYdSplMlRTbDfMABK3glmS40BB+j3ixvsZFuw0kDfTJZAecCMpT/SWRV0QsM X-Received: by 2002:a50:ee0a:: with SMTP id g10mr20384275eds.289.1597776741492; Tue, 18 Aug 2020 11:52:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597776741; cv=none; d=google.com; s=arc-20160816; b=mNB1mF5OPo/NBBBNE6n1MzpSu9xrnqlzLAMoe+OaoD7qHgNQOnw8HEqtwbrv4YIgtv ZixYYDz5PcFFCEZJ81DKEjujApQhj6FeQiGO1GI2JCCC+xTqNb5veg8o8RWOJgtzteTw fzYgMlji5hwWvDSf+AdFOYAZDQiiZDTVndyuU5fCH/NcpXnVpmug3tRCndLn8r2N89pu sAn71o9mALri2HfWjyOiAYqrKeeCs62a0zxCcmZIApbb3+hM0IamN/K91HTFZatPcFNM tFNeMf1NvI3hBsoVgiCfQunOIGg9Y09a1j1FxKhnMnVpXtzVtShQ8U1bPE6rN8LxkdgY d1WQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:in-reply-to:references:date :from:cc:to:subject:ironport-sdr:ironport-sdr; bh=gJfFg61pAHfeHUQkZus/bVWaCKfpoNhP/GEvt9Z0MIU=; b=syM5WtCWm6du5EGQFr9KpdPEBWsGYn9Sykr1OB29vp43y8BF7rMucGbz/NLRcV2YzU t+H9bNRL7vUoBOfVuh9PNU3wVQBj3ZjoRaiDAraF7EdUEAknQZn/qlj/4IHUk3JgyoXn lAaw1qqZCOig1NURkc7yo+0QMV+RXIIIyuoM4dWlu1h2kpQN0Cly1/hlp12KLR2+svOf IQdHbcX7rDopOL1BExiQFWdMHVNnV6gc1Br4MbpxsJMIslgJX/eW6M2oYI+H6zsofm0o Hdp8D39ZYuY21OM2p+qQ28A7AclapJBOwXrqnY5ej7KQ7+sgVx8jyju1pzf1ztFkEsT7 lQeg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g22si14092703eje.263.2020.08.18.11.51.57; Tue, 18 Aug 2020 11:52:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726876AbgHRSsn (ORCPT + 99 others); Tue, 18 Aug 2020 14:48:43 -0400 Received: from mga17.intel.com ([192.55.52.151]:14589 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726796AbgHRSsT (ORCPT ); Tue, 18 Aug 2020 14:48:19 -0400 IronPort-SDR: JBoku43r4y0hL67kjFJ8aoDxMdMDspJ3vlt5+q9/Wjmxx1EoQ/34YemrITD+KV6UZ+4897g/sZ /Excz9XrX97Q== X-IronPort-AV: E=McAfee;i="6000,8403,9717"; a="135055393" X-IronPort-AV: E=Sophos;i="5.76,328,1592895600"; d="scan'208";a="135055393" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Aug 2020 11:48:17 -0700 IronPort-SDR: d9q7J56+mG8EgqUnr2U/NM5hDKU/yMWTOkNCzvdZpgk0jd5aMDCnwhQvMa4pv0ALFBT7Kr3nfW hMd5Ncc2rAqA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,328,1592895600"; d="scan'208";a="497466957" Received: from viggo.jf.intel.com (HELO localhost.localdomain) ([10.54.77.144]) by fmsmga005.fm.intel.com with ESMTP; 18 Aug 2020 11:48:17 -0700 Subject: [RFC][PATCH 9/9] mm/migrate: new zone_reclaim_mode to enable reclaim migration To: linux-kernel@vger.kernel.org Cc: Dave Hansen , yang.shi@linux.alibaba.com, rientjes@google.com, ying.huang@intel.com, dan.j.williams@intel.com From: Dave Hansen Date: Tue, 18 Aug 2020 11:41:38 -0700 References: <20200818184122.29C415DF@viggo.jf.intel.com> In-Reply-To: <20200818184122.29C415DF@viggo.jf.intel.com> Message-Id: <20200818184138.1325BE39@viggo.jf.intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Dave Hansen Some method is obviously needed to enable reclaim-based migration. Just like traditional autonuma, there will be some workloads that will benefit like workloads with more "static" configurations where hot pages stay hot and cold pages stay cold. If pages come and go from the hot and cold sets, the benefits of this approach will be more limited. The benefits are truly workload-based and *not* hardware-based. We do not believe that there is a viable threshold where certain hardware configurations should have this mechanism enabled while others do not. To be conservative, earlier work defaulted to disable reclaim- based migration and did not include a mechanism to enable it. This propses extending the existing "zone_reclaim_mode" (now now really node_reclaim_mode) as a method to enable it. We are open to any alternative that allows end users to enable this mechanism or disable it it workload harm is detected (just like traditional autonuma). Signed-off-by: Dave Hansen Cc: Yang Shi Cc: David Rientjes Cc: Huang Ying Cc: Dan Williams --- b/Documentation/admin-guide/sysctl/vm.rst | 9 +++++++++ b/include/linux/swap.h | 3 ++- b/include/uapi/linux/mempolicy.h | 1 + b/mm/vmscan.c | 6 ++++-- 4 files changed, 16 insertions(+), 3 deletions(-) diff -puN Documentation/admin-guide/sysctl/vm.rst~RECLAIM_MIGRATE Documentation/admin-guide/sysctl/vm.rst --- a/Documentation/admin-guide/sysctl/vm.rst~RECLAIM_MIGRATE 2020-08-18 11:36:57.516583167 -0700 +++ b/Documentation/admin-guide/sysctl/vm.rst 2020-08-18 11:36:57.527583167 -0700 @@ -954,6 +954,7 @@ This is value OR'ed together of 1 Zone reclaim on 2 Zone reclaim writes dirty pages out 4 Zone reclaim swaps pages +8 Zone reclaim migrates pages = =================================== zone_reclaim_mode is disabled by default. For file servers or workloads @@ -978,3 +979,11 @@ of other processes running on other node Allowing regular swap effectively restricts allocations to the local node unless explicitly overridden by memory policies or cpuset configurations. + +Page migration during reclaim is intended for systems with tiered memory +configurations. These systems have multiple types of memory with varied +performance characteristics instead of plain NUMA systems where the same +kind of memory is found at varied distances. Allowing page migration +during reclaim enables these systems to migrate pages from fast tiers to +slow tiers when the fast tier is under pressure. This migration is +performed before swap. diff -puN include/linux/swap.h~RECLAIM_MIGRATE include/linux/swap.h --- a/include/linux/swap.h~RECLAIM_MIGRATE 2020-08-18 11:36:57.518583167 -0700 +++ b/include/linux/swap.h 2020-08-18 11:36:57.527583167 -0700 @@ -387,7 +387,8 @@ extern int sysctl_min_slab_ratio; static inline bool node_reclaim_enabled(void) { /* Is any node_reclaim_mode bit set? */ - return node_reclaim_mode & (RECLAIM_ZONE|RECLAIM_WRITE|RECLAIM_UNMAP); + return node_reclaim_mode & (RECLAIM_ZONE |RECLAIM_WRITE| + RECLAIM_UNMAP|RECLAIM_MIGRATE); } extern void check_move_unevictable_pages(struct pagevec *pvec); diff -puN include/uapi/linux/mempolicy.h~RECLAIM_MIGRATE include/uapi/linux/mempolicy.h --- a/include/uapi/linux/mempolicy.h~RECLAIM_MIGRATE 2020-08-18 11:36:57.520583167 -0700 +++ b/include/uapi/linux/mempolicy.h 2020-08-18 11:36:57.527583167 -0700 @@ -69,5 +69,6 @@ enum { #define RECLAIM_ZONE (1<<0) /* Run shrink_inactive_list on the zone */ #define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */ #define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */ +#define RECLAIM_MIGRATE (1<<3) /* Migrate to other nodes during reclaim */ #endif /* _UAPI_LINUX_MEMPOLICY_H */ diff -puN mm/vmscan.c~RECLAIM_MIGRATE mm/vmscan.c --- a/mm/vmscan.c~RECLAIM_MIGRATE 2020-08-18 11:36:57.522583167 -0700 +++ b/mm/vmscan.c 2020-08-18 11:36:57.528583167 -0700 @@ -1083,6 +1083,9 @@ bool migrate_demote_page_ok(struct page VM_BUG_ON_PAGE(PageHuge(page), page); VM_BUG_ON_PAGE(PageLRU(page), page); + if (!(node_reclaim_mode & RECLAIM_MIGRATE)) + return false; + /* It is pointless to do demotion in memcg reclaim */ if (cgroup_reclaim(sc)) return false; @@ -1092,8 +1095,7 @@ bool migrate_demote_page_ok(struct page if (PageTransHuge(page) && !thp_migration_supported()) return false; - // FIXME: actually enable this later in the series - return false; + return true; } _