Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp169728img; Thu, 21 Mar 2019 17:01:06 -0700 (PDT) X-Google-Smtp-Source: APXvYqz0G1CZ7hR0aDGgBgeOJeb0lVLscMmOEfR2bjtz5J8m2SdvCkNWYnQuNNoJ5Yscqi8D/38i X-Received: by 2002:a65:620e:: with SMTP id d14mr6054348pgv.28.1553212866298; Thu, 21 Mar 2019 17:01:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553212866; cv=none; d=google.com; s=arc-20160816; b=YpD1WZ5K15qyWgrbA34KgHWFgFHJWIdPwuxDAOU4OhSyBmrsYJRmoYkoKWNQJvgKBy KQGC7R4VNKj/gdUyg8flo+ZHSlzsH5PqY9lcqpeS2ISY5mzSloX2GVrqePNPlaLKVQBl F3O3EsE31y1uQkGoPlQUcSCiDkKC0qOK07jrx5ADDN3OhbCIitqgRP9ws4yfD5YDCANm SYQeMwxn7c2Ok+CHGj5sC64Y20YRIJz6scnx23gjyENm2j4DSIwcaAWPDjJdTaXCX5y8 vBPNo8qrkx3fWQHInUL2S+FCkxs04eDL/RtHK2N1l+fp7XUjPInGRxu8KSVkpoMTUuet Y5dg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=yDJgdezwv9pb242vYRRmjQ+71NeLZ6x0P4nCuwhNF0Y=; b=HVSmSNRgRigxcfJn7fHoEf3peE3pLo78GAQnu80wofAE0KOksbYFpNdRR1wT0128ys BWe1kNaEigDIuZFJ2c0zgvc0+dXc5QpHRZ4OcaGyIxBb6j/4oRQUde2QqPJf3mZHEa/J BEbXMH7+2gdIvC/Vuyz0MDozPoO1sCKhHcZOeynmN56ydnNAvcG+ftVTpGZsFJmTx06X d9Xj6dq6TLZiRP2S5HRuFmMjZeDQzJvbO8kpozM8EvEw4j0TDRn4lNwCTwkhps1K1wje jncc4YS4CvsMK36kJ5t6hvrEdxEK5cMSWp0DtUUuS4pfiN06AgdZyB/8nZdYR9MctmxX 08gA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=MPlhlOyy; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x2si5079185pgr.203.2019.03.21.17.00.50; Thu, 21 Mar 2019 17:01:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=MPlhlOyy; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727329AbfCUX6a (ORCPT + 99 others); Thu, 21 Mar 2019 19:58:30 -0400 Received: from mail-qt1-f195.google.com ([209.85.160.195]:46791 "EHLO mail-qt1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726374AbfCUX63 (ORCPT ); Thu, 21 Mar 2019 19:58:29 -0400 Received: by mail-qt1-f195.google.com with SMTP id z17so606733qts.13 for ; Thu, 21 Mar 2019 16:58:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=yDJgdezwv9pb242vYRRmjQ+71NeLZ6x0P4nCuwhNF0Y=; b=MPlhlOyyK7u4fqsteq1DAEZTAYrFd+a2qBWI/pK5peJhhF2+Uul6P2iYAgx2H97Jeg N+N8zwJdJISkaib/XjXsjMcwD/Gukrdo3YtTkWGAMsXbgPeDnlpRpX+MdbtoUS2NwBKo 2dBmLsN12RBI8cqFobRCfWG9GwF4uId38gnan/UP024I/cerlCxP2yZ16+bZDkO5KXGA KIME3MjKWo8rP1/Xvf5pO8LzPByR3aIgLn8EGZT8MohLa9HfR+K672xPem2gHboejnNB Mlv/pMyC8QTwfVEdPz0jXRvDV+oq4eVMwzJvDeQyUF8R+NvGSBm4O4IuSQsXUrlGZ7G1 qamQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=yDJgdezwv9pb242vYRRmjQ+71NeLZ6x0P4nCuwhNF0Y=; b=j3ULEpDY0XFlJi4huLWBo6ig7uFJ14icFs3Tegh0G1807LmxS1c4SBcxKA8Uoa8Ke1 kQTUvu3+2TYJDfrS1gvQiCVCsOLxOdKobpQrAHqAJLON9L6dW9IuXhz1L5GEFKl6of0h NIca3KLWEUpP0nwLXPuELs7gPPZDwcANWlESuX7QzNBYF4eVkXD8r89m878wVmaRmVON TikEV5TpAWCIJSMr54wNWx3WDktUs2weLN1emkKruZq+UeA3hBfZzXox+2zspddtjJN/ Psx8p5wS0SHS7MNJ/eCk8Xq7LWGRSoa2EsB9XKDzP8J3vGmFSoEKcmPVxHRFm/oAi2b6 vZOQ== X-Gm-Message-State: APjAAAXHrOfSI14Cl4COUoob2qa7IkKernbHI4YChpXTjjMpDS0aTTxq 9XPMceqYQg8UUl5faHp2lZ99Zy8+I7JYryKXOtY= X-Received: by 2002:ac8:2e99:: with SMTP id h25mr5780960qta.166.1553212708415; Thu, 21 Mar 2019 16:58:28 -0700 (PDT) MIME-Version: 1.0 References: <20190321200157.29678-1-keith.busch@intel.com> <20190321200157.29678-4-keith.busch@intel.com> In-Reply-To: <20190321200157.29678-4-keith.busch@intel.com> From: Yang Shi Date: Thu, 21 Mar 2019 16:58:16 -0700 Message-ID: Subject: Re: [PATCH 3/5] mm: Attempt to migrate page in lieu of discard To: Keith Busch Cc: Linux Kernel Mailing List , Linux MM , linux-nvdimm@lists.01.org, Dave Hansen , Dan Williams Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 21, 2019 at 1:03 PM Keith Busch wrote: > > If a memory node has a preferred migration path to demote cold pages, > attempt to move those inactive pages to that migration node before > reclaiming. This will better utilize available memory, provide a faster > tier than swapping or discarding, and allow such pages to be reused > immediately without IO to retrieve the data. > > Some places we would like to see this used: > > 1. Persistent memory being as a slower, cheaper DRAM replacement > 2. Remote memory-only "expansion" NUMA nodes > 3. Resolving memory imbalances where one NUMA node is seeing more > allocation activity than another. This helps keep more recent > allocations closer to the CPUs on the node doing the allocating. > > Signed-off-by: Keith Busch > --- > include/linux/migrate.h | 6 ++++++ > include/trace/events/migrate.h | 3 ++- > mm/debug.c | 1 + > mm/migrate.c | 45 ++++++++++++++++++++++++++++++++++++++++++ > mm/vmscan.c | 15 ++++++++++++++ > 5 files changed, 69 insertions(+), 1 deletion(-) > > diff --git a/include/linux/migrate.h b/include/linux/migrate.h > index e13d9bf2f9a5..a004cb1b2dbb 100644 > --- a/include/linux/migrate.h > +++ b/include/linux/migrate.h > @@ -25,6 +25,7 @@ enum migrate_reason { > MR_MEMPOLICY_MBIND, > MR_NUMA_MISPLACED, > MR_CONTIG_RANGE, > + MR_DEMOTION, > MR_TYPES > }; > > @@ -79,6 +80,7 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping, > extern int migrate_page_move_mapping(struct address_space *mapping, > struct page *newpage, struct page *page, enum migrate_mode mode, > int extra_count); > +extern bool migrate_demote_mapping(struct page *page); > #else > > static inline void putback_movable_pages(struct list_head *l) {} > @@ -105,6 +107,10 @@ static inline int migrate_huge_page_move_mapping(struct address_space *mapping, > return -ENOSYS; > } > > +static inline bool migrate_demote_mapping(struct page *page) > +{ > + return false; > +} > #endif /* CONFIG_MIGRATION */ > > #ifdef CONFIG_COMPACTION > diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h > index 705b33d1e395..d25de0cc8714 100644 > --- a/include/trace/events/migrate.h > +++ b/include/trace/events/migrate.h > @@ -20,7 +20,8 @@ > EM( MR_SYSCALL, "syscall_or_cpuset") \ > EM( MR_MEMPOLICY_MBIND, "mempolicy_mbind") \ > EM( MR_NUMA_MISPLACED, "numa_misplaced") \ > - EMe(MR_CONTIG_RANGE, "contig_range") > + EM(MR_CONTIG_RANGE, "contig_range") \ > + EMe(MR_DEMOTION, "demotion") > > /* > * First define the enums in the above macros to be exported to userspace > diff --git a/mm/debug.c b/mm/debug.c > index c0b31b6c3877..53d499f65199 100644 > --- a/mm/debug.c > +++ b/mm/debug.c > @@ -25,6 +25,7 @@ const char *migrate_reason_names[MR_TYPES] = { > "mempolicy_mbind", > "numa_misplaced", > "cma", > + "demotion", > }; > > const struct trace_print_flags pageflag_names[] = { > diff --git a/mm/migrate.c b/mm/migrate.c > index 705b320d4b35..83fad87361bf 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1152,6 +1152,51 @@ static int __unmap_and_move(struct page *page, struct page *newpage, > return rc; > } > > +/** > + * migrate_demote_mapping() - Migrate this page and its mappings to its > + * demotion node. > + * @page: An isolated, non-compound page that should move to > + * its current node's migration path. > + * > + * @returns: True if migrate demotion was successful, false otherwise > + */ > +bool migrate_demote_mapping(struct page *page) > +{ > + int rc, next_nid = next_migration_node(page_to_nid(page)); > + struct page *newpage; > + > + /* > + * The flags are set to allocate only on the desired node in the > + * migration path, and to fail fast if not immediately available. We > + * are already in the memory reclaim path, we don't want heroic > + * efforts to get a page. > + */ > + gfp_t mask = GFP_NOWAIT | __GFP_NOWARN | __GFP_NORETRY | > + __GFP_NOMEMALLOC | __GFP_THISNODE; > + > + VM_BUG_ON_PAGE(PageCompound(page), page); > + VM_BUG_ON_PAGE(PageLRU(page), page); > + > + if (next_nid < 0) > + return false; > + > + newpage = alloc_pages_node(next_nid, mask, 0); > + if (!newpage) > + return false; > + > + /* > + * MIGRATE_ASYNC is the most light weight and never blocks. > + */ > + rc = __unmap_and_move_locked(page, newpage, MIGRATE_ASYNC); > + if (rc != MIGRATEPAGE_SUCCESS) { > + __free_pages(newpage, 0); > + return false; > + } > + > + set_page_owner_migrate_reason(newpage, MR_DEMOTION); > + return true; > +} > + > /* > * gcc 4.7 and 4.8 on arm get an ICEs when inlining unmap_and_move(). Work > * around it. > diff --git a/mm/vmscan.c b/mm/vmscan.c > index a5ad0b35ab8e..0a95804e946a 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1261,6 +1261,21 @@ static unsigned long shrink_page_list(struct list_head *page_list, > ; /* try to reclaim the page below */ > } > > + if (!PageCompound(page)) { > + if (migrate_demote_mapping(page)) { > + unlock_page(page); > + if (likely(put_page_testzero(page))) > + goto free_it; > + > + /* > + * Speculative reference will free this page, > + * so leave it off the LRU. > + */ > + nr_reclaimed++; > + continue; > + } > + } It looks the reclaim path would fall through if the migration is failed. But, it looks, with patch #4, you may end up trying reclaim an anon page on swapless system if migration is failed? And, actually I have the same question with Yan Zi. Why not just put the demote candidate into a separate list, then migrate all the candidates in bulk with migrate_pages()? Thanks, Yang > + > /* > * Anonymous process memory has backing store? > * Try to allocate it some swap space here. > -- > 2.14.4 >