Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933582AbdLVJAJ convert rfc822-to-8bit (ORCPT ); Fri, 22 Dec 2017 04:00:09 -0500 Received: from tyo162.gate.nec.co.jp ([114.179.232.162]:40559 "EHLO tyo162.gate.nec.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933557AbdLVJAB (ORCPT ); Fri, 22 Dec 2017 04:00:01 -0500 From: Naoya Horiguchi To: Michal Hocko CC: Naoya Horiguchi , Mike Kravetz , "linux-mm@kvack.org" , "Andrew Morton" , LKML Subject: Re: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements Thread-Topic: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements Thread-Index: AQHTdYe6ymCmEtrfoE+9ThtUHNwzSqNLJjoAgABImwCAAxVkAA== Date: Fri, 22 Dec 2017 08:58:48 +0000 Message-ID: <7cf6978c-5bf2-cbe4-6f7f-ba09998f482d@ah.jp.nec.com> References: <20171204140117.7191-1-mhocko@kernel.org> <20171215093309.GU16951@dhcp22.suse.cz> <95ba8db3-f8aa-528a-db4b-80f9d2ba9d2b@ah.jp.nec.com> <20171220095328.GG4831@dhcp22.suse.cz> In-Reply-To: <20171220095328.GG4831@dhcp22.suse.cz> Accept-Language: en-US, ja-JP Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.128.101.16] Content-Type: text/plain; charset="iso-2022-jp" Content-ID: <65045022CB7D1A42A02B84F592F60FE7@gisp.nec.co.jp> Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-TM-AS-MML: disable Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5293 Lines: 137 On 12/20/2017 06:53 PM, Michal Hocko wrote: > On Wed 20-12-17 05:33:36, Naoya Horiguchi wrote: >> >> On 12/15/2017 06:33 PM, Michal Hocko wrote: >>> Naoya, >>> this has passed Mike's review (thanks for that!), you have mentioned >>> that you can pass this through your testing machinery earlier. While >>> I've done some testing already I would really appreciate if you could >>> do that as well. Review would be highly appreciated as well. >> >> Sorry for my slow response. I reviewed/tested this patchset and looks >> good to me overall. > > No need to feel sorry. This doesn't have an urgent priority. Thanks for > the review and testing. Can I assume your {Reviewed,Acked}-by or > Tested-by? > Yes, I tested again with additional changes below, and hugetlb migration works fine from mbind(2). Thank you very much for your work. Reviewed-by: Naoya Horiguchi for the series. Thanks, Naoya Horiguchi >> I have one comment on the code path from mbind(2). >> The callback passed to migrate_pages() in do_mbind() (i.e. new_page()) >> calls alloc_huge_page_noerr() which currently doesn't call SetPageHugeTemporary(), >> so hugetlb migration fails when h->surplus_huge_page >= h->nr_overcommit_huge_pages. > > Yes, I am aware of that. I should have been more explicit in the > changelog. Sorry about that and thanks for pointing it out explicitly. > To be honest I wasn't really sure what to do about this. The code path > is really complex and it made my head spin. I fail to see why we have to > call alloc_huge_page and mess with reservations at all. > >> I don't think this is a bug, but it would be better if mbind(2) works >> more similarly with other migration callers like move_pages(2)/migrate_pages(2). > > If the fix is as easy as the following I will add it to the pile. > Otherwise I would prefer to do this separately after I find some more > time to understand the callpath. > --- > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index e035002d3fb6..08a4af411e25 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -345,10 +345,9 @@ struct huge_bootmem_page { > struct page *alloc_huge_page(struct vm_area_struct *vma, > unsigned long addr, int avoid_reserve); > struct page *alloc_huge_page_node(struct hstate *h, int nid); > -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, > - unsigned long addr, int avoid_reserve); > struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, > nodemask_t *nmask); > +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long address); > int huge_add_to_page_cache(struct page *page, struct address_space *mapping, > pgoff_t idx); > > @@ -526,7 +525,7 @@ struct hstate {}; > #define alloc_huge_page(v, a, r) NULL > #define alloc_huge_page_node(h, nid) NULL > #define alloc_huge_page_nodemask(h, preferred_nid, nmask) NULL > -#define alloc_huge_page_noerr(v, a, r) NULL > +#define alloc_huge_page_vma(vma, address) NULL > #define alloc_bootmem_huge_page(h) NULL > #define hstate_file(f) NULL > #define hstate_sizelog(s) NULL > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 4426c5b23a20..e00deabe6d17 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1672,6 +1672,25 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, > return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask); > } > > +/* mempolicy aware migration callback */ > +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long address) > +{ > + struct mempolicy *mpol; > + nodemask_t *nodemask; > + struct page *page; > + struct hstate *h; > + gfp_t gfp_mask; > + int node; > + > + h = hstate_vma(vma); > + gfp_mask = htlb_alloc_mask(h); > + node = huge_node(vma, address, gfp_mask, &mpol, &nodemask); > + page = alloc_huge_page_nodemask(h, node, nodemask); > + mpol_cond_put(mpol); > + > + return page; > +} > + > /* > * Increase the hugetlb pool such that it can accommodate a reservation > * of size 'delta'. > @@ -2077,20 +2096,6 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, > return ERR_PTR(-ENOSPC); > } > > -/* > - * alloc_huge_page()'s wrapper which simply returns the page if allocation > - * succeeds, otherwise NULL. This function is called from new_vma_page(), > - * where no ERR_VALUE is expected to be returned. > - */ > -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, > - unsigned long addr, int avoid_reserve) > -{ > - struct page *page = alloc_huge_page(vma, addr, avoid_reserve); > - if (IS_ERR(page)) > - page = NULL; > - return page; > -} > - > int alloc_bootmem_huge_page(struct hstate *h) > __attribute__ ((weak, alias("__alloc_bootmem_huge_page"))); > int __alloc_bootmem_huge_page(struct hstate *h) > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index f604b22ebb65..96823fa07f38 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -1121,8 +1121,7 @@ static struct page *new_page(struct page *page, unsigned long start, int **x) > } > > if (PageHuge(page)) { > - BUG_ON(!vma); > - return alloc_huge_page_noerr(vma, address, 1); > + return alloc_huge_page_vma(vma, address); > } else if (thp_migration_supported() && PageTransHuge(page)) { > struct page *thp; > >