Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S938316AbdD0Eoc convert rfc822-to-8bit (ORCPT ); Thu, 27 Apr 2017 00:44:32 -0400 Received: from tyo162.gate.nec.co.jp ([114.179.232.162]:49920 "EHLO tyo162.gate.nec.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751236AbdD0EoW (ORCPT ); Thu, 27 Apr 2017 00:44:22 -0400 From: Naoya Horiguchi To: Zi Yan CC: Anshuman Khandual , Zi Yan , "kirill.shutemov@linux.intel.com" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "akpm@linux-foundation.org" , "minchan@kernel.org" , "vbabka@suse.cz" , "mgorman@techsingularity.net" , "mhocko@kernel.org" , "dnellans@nvidia.com" Subject: Re: [PATCH v5 08/11] mm: hwpoison: soft offline supports thp migration Thread-Topic: [PATCH v5 08/11] mm: hwpoison: soft offline supports thp migration Thread-Index: AQHSuhdo7RO00ChEgEKnHggZp0i0yqHO4mEAgACCDYCACLGCgA== Date: Thu, 27 Apr 2017 04:41:13 +0000 Message-ID: <20170427044112.GA18781@hori1.linux.bs1.fc.nec.co.jp> References: <20170420204752.79703-1-zi.yan@sent.com> <20170420204752.79703-9-zi.yan@sent.com> <62d7eea3-96c8-3230-3e1b-fdc2bfbea6bd@linux.vnet.ibm.com> <58FA2B85.5040904@cs.rutgers.edu> In-Reply-To: <58FA2B85.5040904@cs.rutgers.edu> Accept-Language: en-US, ja-JP Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.128.101.1] Content-Type: text/plain; charset="iso-2022-jp" Content-ID: <16528C837D5DBE48BF2753238348F7C9@gisp.nec.co.jp> Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-TM-AS-MML: disable Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5408 Lines: 140 On Fri, Apr 21, 2017 at 10:55:49AM -0500, Zi Yan wrote: > > > Anshuman Khandual wrote: > > On 04/21/2017 02:17 AM, Zi Yan wrote: > >> From: Naoya Horiguchi > >> > >> This patch enables thp migration for soft offline. > >> > >> Signed-off-by: Naoya Horiguchi > >> > >> ChangeLog: v1 -> v5: > >> - fix page isolation counting error > >> > >> Signed-off-by: Zi Yan > >> --- > >> mm/memory-failure.c | 35 ++++++++++++++--------------------- > >> 1 file changed, 14 insertions(+), 21 deletions(-) > >> > >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c > >> index 9b77476ef31f..23ff02eb3ed4 100644 > >> --- a/mm/memory-failure.c > >> +++ b/mm/memory-failure.c > >> @@ -1481,7 +1481,17 @@ static struct page *new_page(struct page *p, unsigned long private, int **x) > >> if (PageHuge(p)) > >> return alloc_huge_page_node(page_hstate(compound_head(p)), > >> nid); > >> - else > >> + else if (thp_migration_supported() && PageTransHuge(p)) { > >> + struct page *thp; > >> + > >> + thp = alloc_pages_node(nid, > >> + (GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM, > > > > Why not __GFP_RECLAIM ? Its soft offline path we wait a bit before > > declaring that THP page cannot be allocated and hence should invoke > > reclaim methods as well. > > I am not sure how much effort the kernel wants to put here to soft > offline a THP. Naoya knows more here. What I thought at first was that soft offline is not an urgent user and no need to reclaim (i.e. give a little impact on other thread.) But that's not a strong opinion, so if you like __GFP_RECLAIM here, I'm fine about that. > > > > > >> + HPAGE_PMD_ORDER); > >> + if (!thp) > >> + return NULL; > >> + prep_transhuge_page(thp); > >> + return thp; > >> + } else > >> return __alloc_pages_node(nid, GFP_HIGHUSER_MOVABLE, 0); > >> } > >> > >> @@ -1665,8 +1675,8 @@ static int __soft_offline_page(struct page *page, int flags) > >> * cannot have PAGE_MAPPING_MOVABLE. > >> */ > >> if (!__PageMovable(page)) > >> - inc_node_page_state(page, NR_ISOLATED_ANON + > >> - page_is_file_cache(page)); > >> + mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + > >> + page_is_file_cache(page), hpage_nr_pages(page)); > >> list_add(&page->lru, &pagelist); > >> ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, > >> MIGRATE_SYNC, MR_MEMORY_FAILURE); > >> @@ -1689,28 +1699,11 @@ static int __soft_offline_page(struct page *page, int flags) > >> static int soft_offline_in_use_page(struct page *page, int flags) > >> { > >> int ret; > >> - struct page *hpage = compound_head(page); > >> - > >> - if (!PageHuge(page) && PageTransHuge(hpage)) { > >> - lock_page(hpage); > >> - if (!PageAnon(hpage) || unlikely(split_huge_page(hpage))) { > >> - unlock_page(hpage); > >> - if (!PageAnon(hpage)) > >> - pr_info("soft offline: %#lx: non anonymous thp\n", page_to_pfn(page)); > >> - else > >> - pr_info("soft offline: %#lx: thp split failed\n", page_to_pfn(page)); > >> - put_hwpoison_page(hpage); > >> - return -EBUSY; > >> - } > >> - unlock_page(hpage); > >> - get_hwpoison_page(page); > >> - put_hwpoison_page(hpage); > >> - } > >> > >> if (PageHuge(page)) > >> ret = soft_offline_huge_page(page, flags); > >> else > >> - ret = __soft_offline_page(page, flags); > >> + ret = __soft_offline_page(compound_head(page), flags); > > > > Hmm, what if the THP allocation fails in the new_page() path and > > we fallback for general page allocation. In that case we will > > always be still calling with the head page ? Because we dont > > split the huge page any more. > > This could be a problem if the user wants to offline a TailPage but due > to THP allocation failure, the HeadPage is offlined. Right, "retry with split" part is unfinished, so we need some improvement. > > It may be better to only soft offline THPs if page == > compound_head(page). If page != compound_head(page), we still split THPs > like before. > > Because in migrate_pages(), we cannot guarantee any TailPages in that > THP are migrated (1. THP allocation failure causes THP splitting, then > only HeadPage is going to be migrated; 2. even if we change existing > migrate_pages() implementation to add all TailPages to migration list > instead of LRU list, we still cannot guarantee the TailPage we want to > migrate is migrated.). > > Naoya, what do you think? Maybe soft offline is a special caller of page migration because it basically wants to migrate only one page, but thp migration still has a benefit because we can avoid thp split. So I like that we try thp migration at first, and if it fails we fall back to split and migrate (only) a raw error page. This should be done in caller side for soft offline, because it knows where the error page is. As for generic case (for other migration callers which mainly want to migrate multiple pages for their purpose,) thp split and retry can be done in common migration code. After thp split, all subpages are linked to migration list, then we retry without returning to the caller. So I think that split_huge_page() can be moved to (for example) for-loop in migrate_pages(). I tried to write a patch for it last year, but considering vm event accounting, the patch might be large (~100 lines). Thanks, Naoya Horiguchi