Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp1204616pxj; Fri, 18 Jun 2021 01:46:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwzWUwlPJwOOu1u39tWyB+anzV7L08tfvhp+nqCLO+pjkD0Mkhp3hHwKgfmmDfyUgOEb3WE X-Received: by 2002:a05:6638:634:: with SMTP id h20mr2358467jar.14.1624006013995; Fri, 18 Jun 2021 01:46:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1624006013; cv=none; d=google.com; s=arc-20160816; b=bqd+M8+28PvlLtXDZkfLReSPHIrQWAndTbeGISM/+mCIoO31Fg5fFIU0KlkjipVwfY 3pLuwIOcd5mCxjzlD7umw4hHW5RGfivLA2oKgDE30a9NTQH5byJf6wW0691ulh0DYW3P UdcTO5N/d4pPPmDEmTJedfww6mrZsj63gWGTmjFniz58t548NotR9vFbEh2s8h9Rs0x3 lkSPx9OaJj6aKLyS10KiHT9pGFqO9UXncXTwzcyZS4f30CNLwtopxkyxkmINS7c9Alen MqCsAi+yCKKt+kzTya12HF5XlUocm1x+2o5pn2OI2SJa0ckBzsn0LGMjVgpjkUCoyxvy WAfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :ironport-sdr:ironport-sdr; bh=NeWvZ/sMxVydA45c9PLicosCgv+X1+z4NXhod/Scfxs=; b=n6IeSM3Wf7zpQknpyNedIwHKWM277+Vttb8ZwPTy5+JEcXBAMPZ+B1UDWvcW8dHVNg GtLU9Yx4CQYQfKRTqMAhjBMhwSJO0Uu8oNGGJm8uoSbKs2Mh84cjd0Tm0Oa6zDFBVL/C KNAqgqBcJqz6akIAhwMsMD/nwN2eFUK44osmwyw86fZbminTmiBk8oAywOH4TpK8AZxl H+Od8GeCNHWZJf367uUPz0fnHYfiXJX91HQBMn3CNPinDkCNKfCx7Zi9lnQHJBfRmegw 6zeQaw+A+2h4dD2YTSNbJw22lOYAanq/dswfyXMFhaPMTnMUnO07PSQGb6ar3pF0+JxQ Ixww== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g15si1687697ilk.84.2021.06.18.01.46.41; Fri, 18 Jun 2021 01:46:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232251AbhFRGWO (ORCPT + 99 others); Fri, 18 Jun 2021 02:22:14 -0400 Received: from mga18.intel.com ([134.134.136.126]:4823 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233274AbhFRGTZ (ORCPT ); Fri, 18 Jun 2021 02:19:25 -0400 IronPort-SDR: xQUEfhRVxLY+VdLNWQBp2uR10uS/cPh4Efybr4ElNYFCJCbE6LMxGRHMe7tJWq3fwwauGMHr5p tS6UmYrFuk0w== X-IronPort-AV: E=McAfee;i="6200,9189,10018"; a="193815252" X-IronPort-AV: E=Sophos;i="5.83,283,1616482800"; d="scan'208";a="193815252" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jun 2021 23:16:20 -0700 IronPort-SDR: wAbcD0UxpsM1FKmAJjPQKKewDBOlWHpo5wAHmYwsaVrEUrUy7jYcx+3i23NmDyfuMPtcZyp2j4 HD7v2QVc9Bcg== X-IronPort-AV: E=Sophos;i="5.83,283,1616482800"; d="scan'208";a="485573657" Received: from mzhou6-mobl1.ccr.corp.intel.com (HELO yhuang6-mobl1.ccr.corp.intel.com) ([10.254.212.155]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jun 2021 23:16:17 -0700 From: Huang Ying To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Dave Hansen , "Huang, Ying" , Michal Hocko , Wei Xu , Yang Shi , David Rientjes , Dan Williams , osalvador Subject: [PATCH -V8 05/10] mm/migrate: demote pages during reclaim Date: Fri, 18 Jun 2021 14:15:32 +0800 Message-Id: <20210618061537.434999-6-ying.huang@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210618061537.434999-1-ying.huang@intel.com> References: <20210618061537.434999-1-ying.huang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Dave Hansen This is mostly derived from a patch from Yang Shi: https://lore.kernel.org/linux-mm/1560468577-101178-10-git-send-email-yang.shi@linux.alibaba.com/ Add code to the reclaim path (shrink_page_list()) to "demote" data to another NUMA node instead of discarding the data. This always avoids the cost of I/O needed to read the page back in and sometimes avoids the writeout cost when the pagee is dirty. A second pass through shrink_page_list() will be made if any demotions fail. This essentally falls back to normal reclaim behavior in the case that demotions fail. Previous versions of this patch may have simply failed to reclaim pages which were eligible for demotion but were unable to be demoted in practice. Note: This just adds the start of infratructure for migration. It is actually disabled next to the FIXME in migrate_demote_page_ok(). Signed-off-by: Dave Hansen Signed-off-by: "Huang, Ying" Cc: Michal Hocko Cc: Wei Xu Cc: Yang Shi Cc: David Rientjes Cc: Dan Williams Cc: osalvador -- changes from 20210122: * move from GFP_HIGHUSER -> GFP_HIGHUSER_MOVABLE (Ying) changes from 202010: * add MR_NUMA_MISPLACED to trace MIGRATE_REASON define * make migrate_demote_page_ok() static, remove 'sc' arg until later patch * remove unnecessary alloc_demote_page() hugetlb warning * Simplify alloc_demote_page() gfp mask. Depend on __GFP_NORETRY to make it lightweight instead of fancier stuff like leaving out __GFP_IO/FS. * Allocate migration page with alloc_migration_target() instead of allocating directly. changes from 20200730: * Add another pass through shrink_page_list() when demotion fails. changes from 20210302: * Use __GFP_THISNODE and revise the comment explaining the GFP mask constructionn --- include/linux/migrate.h | 9 ++++ include/trace/events/migrate.h | 3 +- mm/vmscan.c | 83 ++++++++++++++++++++++++++++++++++ 3 files changed, 94 insertions(+), 1 deletion(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 4a49bb358787..42952cbe452b 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -28,6 +28,7 @@ enum migrate_reason { MR_NUMA_MISPLACED, MR_CONTIG_RANGE, MR_LONGTERM_PIN, + MR_DEMOTION, MR_TYPES }; @@ -191,6 +192,14 @@ struct migrate_vma { int migrate_vma_setup(struct migrate_vma *args); void migrate_vma_pages(struct migrate_vma *migrate); void migrate_vma_finalize(struct migrate_vma *migrate); +int next_demotion_node(int node); + +#else /* CONFIG_MIGRATION disabled: */ + +static inline int next_demotion_node(int node) +{ + return NUMA_NO_NODE; +} #endif /* CONFIG_MIGRATION */ diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h index 9fb2a3bbcdfb..779f3fad9ecd 100644 --- a/include/trace/events/migrate.h +++ b/include/trace/events/migrate.h @@ -21,7 +21,8 @@ EM( MR_MEMPOLICY_MBIND, "mempolicy_mbind") \ EM( MR_NUMA_MISPLACED, "numa_misplaced") \ EM( MR_CONTIG_RANGE, "contig_range") \ - EMe(MR_LONGTERM_PIN, "longterm_pin") + EM( MR_LONGTERM_PIN, "longterm_pin") \ + EMe(MR_DEMOTION, "demotion") /* * First define the enums in the above macros to be exported to userspace diff --git a/mm/vmscan.c b/mm/vmscan.c index 5199b9696bab..ddda32031f0c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -41,6 +41,7 @@ #include #include #include +#include #include #include #include @@ -1231,6 +1232,23 @@ static enum page_references page_check_references(struct page *page, return PAGEREF_RECLAIM; } +static bool migrate_demote_page_ok(struct page *page) +{ + int next_nid = next_demotion_node(page_to_nid(page)); + + VM_BUG_ON_PAGE(!PageLocked(page), page); + VM_BUG_ON_PAGE(PageHuge(page), page); + VM_BUG_ON_PAGE(PageLRU(page), page); + + if (next_nid == NUMA_NO_NODE) + return false; + if (PageTransHuge(page) && !thp_migration_supported()) + return false; + + // FIXME: actually enable this later in the series + return false; +} + /* Check if a page is dirty or under writeback */ static void page_check_dirty_writeback(struct page *page, bool *dirty, bool *writeback) @@ -1261,6 +1279,47 @@ static void page_check_dirty_writeback(struct page *page, mapping->a_ops->is_dirty_writeback(page, dirty, writeback); } +static struct page *alloc_demote_page(struct page *page, unsigned long node) +{ + struct migration_target_control mtc = { + /* + * Allocate from 'node', or fail the quickly and quietly. + * When this happens, 'page; will likely just be discarded + * instead of migrated. + */ + .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | + __GFP_THISNODE | __GFP_NOWARN | + __GFP_NOMEMALLOC | GFP_NOWAIT, + .nid = node + }; + + return alloc_migration_target(page, (unsigned long)&mtc); +} + +/* + * Take pages on @demote_list and attempt to demote them to + * another node. Pages which are not demoted are left on + * @demote_pages. + */ +static unsigned int demote_page_list(struct list_head *demote_pages, + struct pglist_data *pgdat, + struct scan_control *sc) +{ + int target_nid = next_demotion_node(pgdat->node_id); + unsigned int nr_succeeded = 0; + int err; + + if (list_empty(demote_pages)) + return 0; + + /* Demotion ignores all cpuset and mempolicy settings */ + err = migrate_pages(demote_pages, alloc_demote_page, NULL, + target_nid, MIGRATE_ASYNC, MR_DEMOTION, + &nr_succeeded); + + return nr_succeeded; +} + /* * shrink_page_list() returns the number of reclaimed pages */ @@ -1272,12 +1331,15 @@ static unsigned int shrink_page_list(struct list_head *page_list, { LIST_HEAD(ret_pages); LIST_HEAD(free_pages); + LIST_HEAD(demote_pages); unsigned int nr_reclaimed = 0; unsigned int pgactivate = 0; + bool do_demote_pass = true; memset(stat, 0, sizeof(*stat)); cond_resched(); +retry: while (!list_empty(page_list)) { struct address_space *mapping; struct page *page; @@ -1426,6 +1488,16 @@ static unsigned int shrink_page_list(struct list_head *page_list, ; /* try to reclaim the page below */ } + /* + * Before reclaiming the page, try to relocate + * its contents to another node. + */ + if (do_demote_pass && migrate_demote_page_ok(page)) { + list_add(&page->lru, &demote_pages); + unlock_page(page); + continue; + } + /* * Anonymous process memory has backing store? * Try to allocate it some swap space here. @@ -1676,6 +1748,17 @@ static unsigned int shrink_page_list(struct list_head *page_list, list_add(&page->lru, &ret_pages); VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page), page); } + /* 'page_list' is always empty here */ + + /* Migrate pages selected for demotion */ + nr_reclaimed += demote_page_list(&demote_pages, pgdat, sc); + /* Pages that could not be demoted are still in @demote_pages */ + if (!list_empty(&demote_pages)) { + /* Pages which failed to demoted go back on @page_list for retry: */ + list_splice_init(&demote_pages, page_list); + do_demote_pass = false; + goto retry; + } pgactivate = stat->nr_activate[0] + stat->nr_activate[1]; -- 2.30.2