Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp744480yba; Wed, 3 Apr 2019 19:11:58 -0700 (PDT) X-Google-Smtp-Source: APXvYqyUNMOQ6Mnhe7PcasTNVXEmQkeBdM2+gT+rgqeM6BwlVZKIOLceOhn8OLT8FEwDWQquyAoN X-Received: by 2002:a62:26c1:: with SMTP id m184mr3079191pfm.102.1554343918148; Wed, 03 Apr 2019 19:11:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554343918; cv=none; d=google.com; s=arc-20160816; b=HVkE2xZtqnoESZhKYfoYRgVVVO5k4ADMuDz7ZLCownWn0EvfZa2jN/PKhs7pjgYcv3 7v6us56duLCPxpjQp8K3QYn0BlsTHYsvLfXRAt6W4NSdLO4+MYgnUuJVt4rJUfoCPlJF MctpkGQMmxt5CJRZMApgHcTZ/kjP0UFRwukfgErScqiTBXq3iaNjl0TmBpzHFGezZAiv 7aSz/5+HZDbUTWRkmsYXd0HKRa6J4BmOlg1LaAdn+aammf97R0J4DJsDiJH+PlqulEYj zWZqMGHIpQ1Vpy+QgcKkIdy3Trf68EX90zaaBFvGFUOOH6ej91Jlf14Xdgyj/Bcq0AES 7ZmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :reply-to:references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:dkim-signature; bh=PhVR5Tm+hbp1uFSOwTqwsUckzcZYN8AfZd5SoP03jII=; b=eS7PZ6Xf56dve+In6e5tUf2kwmLmT7YnamXePcPnrOqvI2b43IJwmacCgl03Ep4PRW KchihOsgCZIq650Mr9l21wXE4xGXVyzxx7YvOkqXkEbH5MxL8a7kTOz1uu27vn/Dbfro Z8Mra3VKxIAkoRsng4ZMNNjCELSvcmmD5IGnPO3ENdNYEuNpkbqTXFrATpEV1eAa0ZZc Gf1uKpnKrp8PZX/xeQcybvj0LRrnLG4cbOylPKU9E0EQxeXkL0d5GPQm2N642UWpWtTL XpoAYmiwGsQ8723yKTztucURVnV5E0RySnHYaIoZeYLnM2X3z83kIBjejYYjlpxsrWCa LDXw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm3 header.b="Yr+ax8/Q"; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=oPEdqyfk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x6si14889230pfa.59.2019.04.03.19.11.43; Wed, 03 Apr 2019 19:11:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm3 header.b="Yr+ax8/Q"; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=oPEdqyfk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727773AbfDDCKo (ORCPT + 99 others); Wed, 3 Apr 2019 22:10:44 -0400 Received: from out5-smtp.messagingengine.com ([66.111.4.29]:39403 "EHLO out5-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726539AbfDDCJr (ORCPT ); Wed, 3 Apr 2019 22:09:47 -0400 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id 5ADAD22A8A; Wed, 3 Apr 2019 22:01:57 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Wed, 03 Apr 2019 22:01:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm3; bh=PhVR5Tm+hbp1u FSOwTqwsUckzcZYN8AfZd5SoP03jII=; b=Yr+ax8/QVYPA1+W4dwtnPPgONbRsx CoMF+dfi9QtEPs7g0RiLU0pk0iVzICDgXU+At416R46J4I7W0fTnatME4kl5KAbu zcy+cb/BVsz3H3kZbWWZ9WpsYqBgrPkJifyZQq0NQGJaBQTq1SzAFzQJurhA2+5y O0RrMVKSKeObFdnk6RqGNur+0U6DjptvN0jr9DJnVmwAYeD6I7g8JLUW3xD44aT5 12e05CYeOXAtVaDMUatMFf/HENG9325wrtYwsBbHYhuCvNhDQbgpCq5MKFsEaW3Y kYc9owHk7glJtDNRiJDtGpilgFsbr6V8Jjj0cVWjKi4ce6Ts0Y9CHQqfw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=PhVR5Tm+hbp1uFSOwTqwsUckzcZYN8AfZd5SoP03jII=; b=oPEdqyfk qmB60PvsJ01YVyhW1msq2+79MMbiCd+DOqBpEd4/V4c2ONuYgW8GsXHZakCoiao4 l9yLhOoKXJo2VL4oM/S0YbHmTJM42epHCIKJOjcH/Yu3ZtWB5es5DBgjHD7aQ9Sy L7TeL0RCElkGYeWslFraN4QHcpPQGbWYR3JKuX/aSnyuc9GHH35RhRI7T3zGOGzd 9sPWcZXsZSAorFAWcUJ77jbioPTPHHMWFXrscP2Ux0wZzw/nKDPs5p1J0vPXyv5D R9pGXdzYZB3ZKXBR+6iuv1bP5hVNpgxYyhY3Z+FLe1ipe4cyT5rheF5Jq3EPkg5E xL7NTzTZordTDg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduuddrtdeggdehudculddtuddrgedutddrtddtmd cutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdp uffrtefokffrpgfnqfghnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivg hnthhsucdlqddutddtmdenucfjughrpefhvffufffkofgjfhhrggfgsedtkeertdertddt necuhfhrohhmpegkihcujggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucfkph epvdduiedrvddvkedrudduvddrvddvnecurfgrrhgrmhepmhgrihhlfhhrohhmpeiiihdr higrnhesshgvnhhtrdgtohhmnecuvehluhhsthgvrhfuihiivgepvdeg X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 7114310316; Wed, 3 Apr 2019 22:01:55 -0400 (EDT) From: Zi Yan To: Dave Hansen , Yang Shi , Keith Busch , Fengguang Wu , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Daniel Jordan , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , Javier Cabezas , David Nellans , Zi Yan Subject: [RFC PATCH 25/25] memory manage: use exchange pages to memory manage to improve throughput. Date: Wed, 3 Apr 2019 19:00:46 -0700 Message-Id: <20190404020046.32741-26-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190404020046.32741-1-zi.yan@sent.com> References: <20190404020046.32741-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Zi Yan 1. Exclude file-backed base pages from exchanging. 2. Split THP in exchange pages if THP support is disabled. 3. if THP migration is supported, only exchange THPs. Signed-off-by: Zi Yan --- mm/memory_manage.c | 173 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 173 insertions(+) diff --git a/mm/memory_manage.c b/mm/memory_manage.c index 8b76fcf..d3d07b7 100644 --- a/mm/memory_manage.c +++ b/mm/memory_manage.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -253,6 +254,147 @@ static int putback_overflow_pages(unsigned long max_nr_base_pages, huge_page_list, nr_huge_pages); } +static int add_pages_to_exchange_list(struct list_head *from_pagelist, + struct list_head *to_pagelist, struct exchange_page_info *info_list, + struct list_head *exchange_list, unsigned long info_list_size) +{ + unsigned long info_list_index = 0; + LIST_HEAD(failed_from_list); + LIST_HEAD(failed_to_list); + + while (!list_empty(from_pagelist) && !list_empty(to_pagelist)) { + struct page *from_page, *to_page; + struct exchange_page_info *one_pair = &info_list[info_list_index]; + int rc; + + from_page = list_first_entry_or_null(from_pagelist, struct page, lru); + to_page = list_first_entry_or_null(to_pagelist, struct page, lru); + + if (!from_page || !to_page) + break; + + if (!thp_migration_supported() && PageTransHuge(from_page)) { + lock_page(from_page); + rc = split_huge_page_to_list(from_page, &from_page->lru); + unlock_page(from_page); + if (rc) { + list_move(&from_page->lru, &failed_from_list); + continue; + } + } + + if (!thp_migration_supported() && PageTransHuge(to_page)) { + lock_page(to_page); + rc = split_huge_page_to_list(to_page, &to_page->lru); + unlock_page(to_page); + if (rc) { + list_move(&to_page->lru, &failed_to_list); + continue; + } + } + + if (hpage_nr_pages(from_page) != hpage_nr_pages(to_page)) { + if (!(hpage_nr_pages(from_page) == 1 && hpage_nr_pages(from_page) == HPAGE_PMD_NR)) { + list_del(&from_page->lru); + list_add(&from_page->lru, &failed_from_list); + } + if (!(hpage_nr_pages(to_page) == 1 && hpage_nr_pages(to_page) == HPAGE_PMD_NR)) { + list_del(&to_page->lru); + list_add(&to_page->lru, &failed_to_list); + } + continue; + } + + /* Exclude file-backed pages, exchange it concurrently is not + * implemented yet. */ + if (page_mapping(from_page)) { + list_del(&from_page->lru); + list_add(&from_page->lru, &failed_from_list); + continue; + } + if (page_mapping(to_page)) { + list_del(&to_page->lru); + list_add(&to_page->lru, &failed_to_list); + continue; + } + + list_del(&from_page->lru); + list_del(&to_page->lru); + + one_pair->from_page = from_page; + one_pair->to_page = to_page; + + list_add_tail(&one_pair->list, exchange_list); + + info_list_index++; + if (info_list_index >= info_list_size) + break; + } + list_splice(&failed_from_list, from_pagelist); + list_splice(&failed_to_list, to_pagelist); + + return info_list_index; +} + +static unsigned long exchange_pages_between_nodes(unsigned long nr_from_pages, + unsigned long nr_to_pages, struct list_head *from_page_list, + struct list_head *to_page_list, int batch_size, + bool huge_page, enum migrate_mode mode) +{ + struct exchange_page_info *info_list; + unsigned long info_list_size = min_t(unsigned long, + nr_from_pages, nr_to_pages) / (huge_page?HPAGE_PMD_NR:1); + unsigned long added_size = 0; + bool migrate_concur = mode & MIGRATE_CONCUR; + LIST_HEAD(exchange_list); + + /* non concurrent does not need to split into batches */ + if (!migrate_concur || batch_size <= 0) + batch_size = info_list_size; + + /* prepare for huge page split */ + if (!thp_migration_supported() && huge_page) { + batch_size = batch_size * HPAGE_PMD_NR; + info_list_size = info_list_size * HPAGE_PMD_NR; + } + + info_list = kvzalloc(sizeof(struct exchange_page_info)*batch_size, + GFP_KERNEL); + if (!info_list) + return 0; + + while (!list_empty(from_page_list) && !list_empty(to_page_list)) { + unsigned long nr_added_pages; + INIT_LIST_HEAD(&exchange_list); + + nr_added_pages = add_pages_to_exchange_list(from_page_list, to_page_list, + info_list, &exchange_list, batch_size); + + /* + * Nothing to exchange, we bail out. + * + * In case from_page_list and to_page_list both only have file-backed + * pages left */ + if (!nr_added_pages) + break; + + added_size += nr_added_pages; + + VM_BUG_ON(added_size > info_list_size); + + if (migrate_concur) + exchange_pages_concur(&exchange_list, mode, MR_SYSCALL); + else + exchange_pages(&exchange_list, mode, MR_SYSCALL); + + memset(info_list, 0, sizeof(struct exchange_page_info)*batch_size); + } + + kvfree(info_list); + + return info_list_size; +} + static int do_mm_manage(struct task_struct *p, struct mm_struct *mm, const nodemask_t *slow, const nodemask_t *fast, unsigned long nr_pages, int flags) @@ -261,6 +403,7 @@ static int do_mm_manage(struct task_struct *p, struct mm_struct *mm, bool migrate_concur = flags & MPOL_MF_MOVE_CONCUR; bool migrate_dma = flags & MPOL_MF_MOVE_DMA; bool move_hot_and_cold_pages = flags & MPOL_MF_MOVE_ALL; + bool migrate_exchange_pages = flags & MPOL_MF_EXCHANGE; struct mem_cgroup *memcg = mem_cgroup_from_task(p); int err = 0; unsigned long nr_isolated_slow_pages; @@ -338,6 +481,35 @@ static int do_mm_manage(struct task_struct *p, struct mm_struct *mm, &nr_isolated_fast_base_pages, &nr_isolated_fast_huge_pages, move_hot_and_cold_pages?ISOLATE_HOT_AND_COLD_PAGES:ISOLATE_COLD_PAGES); + if (migrate_exchange_pages) { + unsigned long nr_exchange_pages; + + /* + * base pages can include file-backed ones, we do not handle them + * at the moment + */ + if (!thp_migration_supported()) { + nr_exchange_pages = exchange_pages_between_nodes(nr_isolated_slow_base_pages, + nr_isolated_fast_base_pages, &slow_base_page_list, + &fast_base_page_list, migration_batch_size, false, mode); + + nr_isolated_fast_base_pages -= nr_exchange_pages; + } + + /* THP page exchange */ + nr_exchange_pages = exchange_pages_between_nodes(nr_isolated_slow_huge_pages, + nr_isolated_fast_huge_pages, &slow_huge_page_list, + &fast_huge_page_list, migration_batch_size, true, mode); + + /* split THP above, so we do not need to multiply the counter */ + if (!thp_migration_supported()) + nr_isolated_fast_huge_pages -= nr_exchange_pages; + else + nr_isolated_fast_huge_pages -= nr_exchange_pages * HPAGE_PMD_NR; + + goto migrate_out; + } else { +migrate_out: /* Migrate pages to slow node */ /* No multi-threaded migration for base pages */ nr_isolated_fast_base_pages -= @@ -347,6 +519,7 @@ static int do_mm_manage(struct task_struct *p, struct mm_struct *mm, nr_isolated_fast_huge_pages -= migrate_to_node(&fast_huge_page_list, slow_nid, mode, migration_batch_size); + } } if (nr_isolated_fast_base_pages != ULONG_MAX && -- 2.7.4