Received: by 10.223.185.116 with SMTP id b49csp258071wrg; Tue, 20 Feb 2018 20:41:58 -0800 (PST) X-Google-Smtp-Source: AH8x224RBcxLmosrF7Edvos2voc/ce6DaBxbtCP4jKgt+BP6zOJnEVWGJmhedJAwwFKTk3gtM36e X-Received: by 2002:a17:902:b707:: with SMTP id d7-v6mr1936364pls.119.1519188118088; Tue, 20 Feb 2018 20:41:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519188118; cv=none; d=google.com; s=arc-20160816; b=M9hlStLT5PV6rwCnJWdeHz+iRqSGs1Ppjgi6yyMIXME/1N17F8m7nVIzmc9rK3+4nd 5oO1M5PZrnuFKtfyzVlwNmE48EwOj2ojCryJIFEwo0sfErCqgxMAeEO8j/jVY7Vflcjn FVYGTcB8MfCZrqOLSAcaoJloPGfzA/Nwwhy4wj1iRqLtxM67DSlWPOkm9A0NGv8b4QJr 3QxCF48nAJV4fNiDW4iqTRxBoN0DN5H+KnsFwm7097kyu3g64FSGS6ykqD3poet5PXDO lo/ipVzK1RYPF99w7fgzVofVtDn6utJY2c4PuZpe7ciEpV7jmd+kWUDpJqIkyw7jbezF SVLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=X0zCfXnX9Bp4942iW101Hk+7mUHO6rELUsiTnUwaXbE=; b=PQ6otcl4YTTjw4y8xQn2b8BP/uVfICPvnjRyIxyIN+VXOg/A7p52YskaylGDNsymdp Xw5MOwl7eNx8tzTOsuz4DadBdUNUuHpVVougFFc3fRb8qyeTUrk1RoQguk8Ary1NUnXe DWBpv2Rn0rGW5Qw+iDPj0w5kcY42OEyuW+6X9fDz49mfaXtIDt6Jo7eJupYgaqQsIksR nt8XQOCkQHsgApMW+PgCI8iL7E2w+PLXWAcQdLK4eOBWHQIBbkxGwwej4f5peAWL/Rcx MYLQTodNVjhSoFjOku7sU3hjY121xkpH4o8xkpi3f89wsTo/cXlBLrvEAp7e0rCN1ttC jXOw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=VWeQmgdz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q12-v6si3296624plr.434.2018.02.20.20.41.44; Tue, 20 Feb 2018 20:41:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=VWeQmgdz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751352AbeBUEZC (ORCPT + 99 others); Tue, 20 Feb 2018 23:25:02 -0500 Received: from mail-io0-f193.google.com ([209.85.223.193]:45686 "EHLO mail-io0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751017AbeBUEZA (ORCPT ); Tue, 20 Feb 2018 23:25:00 -0500 Received: by mail-io0-f193.google.com with SMTP id m22so694831iob.12 for ; Tue, 20 Feb 2018 20:25:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=X0zCfXnX9Bp4942iW101Hk+7mUHO6rELUsiTnUwaXbE=; b=VWeQmgdzD6nloqV7YkhY8eZK9TRymc7GsXVJZ4649gkq5K8DMt3a5apds+7gHkhRDA xLf928roHH1EoPpBbys9RgTZ8XJ91/mb41aTymafwzRcjL0mfpwKhU5GedhJCLTCGkHX kLZmAl3OYafFnsganDUDqTEwWNl4S19OhfTeg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=X0zCfXnX9Bp4942iW101Hk+7mUHO6rELUsiTnUwaXbE=; b=kniGmidczfHAseV4vX3eTo4eesIfxgVHTgtaLZGvqkWlsm811ZVZ/a3zwvsOO5WN55 JZA58iVBo+BjXkxohH2UzULpgv0cjQnpaVF6C+G10xJVwm0UvYZ+K4ZdR/oOOwH2zzRc vSP7BGF3F7zGzY5p7TF72fkkEcF0Q+I4tcFxA+4ClY6K5uUyva2w/ZnMZgCQdfIddnW1 2lT9xVBo2dTbMtMoUqqUFb4ZxZVw9r9PZTLGUy+cj0jK5TvrpPKztuWapxtfVLmNEl6v Ro4vwh6pyl2+gUobb8RxGRDn+r3Cy5GL5bngTcJkKlngPU+DfouCy/qkzLD/youHjcfL BpWg== X-Gm-Message-State: APf1xPC8GdT5z2A1HNZKpZXfzMY3Q4gTKKobFpr4DSLsVqvHNNAwPXXb 82BoRjlAuDvryzif+6yrlPxmHA== X-Received: by 10.107.157.66 with SMTP id g63mr2695622ioe.108.1519187099669; Tue, 20 Feb 2018 20:24:59 -0800 (PST) Received: from localhost (c-68-47-89-210.hsd1.mn.comcast.net. [68.47.89.210]) by smtp.gmail.com with ESMTPSA id v18sm9068132ita.29.2018.02.20.20.24.58 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 20 Feb 2018 20:24:59 -0800 (PST) Date: Tue, 20 Feb 2018 22:24:57 -0600 From: Dan Rue To: Michal Hocko Cc: Andrew Morton , linux-mm@kvack.org, Mike Kravetz , Naoya Horiguchi , LKML , Michal Hocko Subject: Re: [PATCH 5/6] mm, hugetlb: further simplify hugetlb allocation API Message-ID: <20180221042457.uolmhlmv5je5dqx7@xps> References: <20180103093213.26329-1-mhocko@kernel.org> <20180103093213.26329-6-mhocko@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180103093213.26329-6-mhocko@kernel.org> User-Agent: NeoMutt/20171215 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 03, 2018 at 10:32:12AM +0100, Michal Hocko wrote: > From: Michal Hocko > > Hugetlb allocator has several layer of allocation functions depending > and the purpose of the allocation. There are two allocators depending > on whether the page can be allocated from the page allocator or we need > a contiguous allocator. This is currently opencoded in alloc_fresh_huge_page > which is the only path that might allocate giga pages which require the > later allocator. Create alloc_fresh_huge_page which hides this > implementation detail and use it in all callers which hardcoded the > buddy allocator path (__hugetlb_alloc_buddy_huge_page). This shouldn't > introduce any funtional change because both migration and surplus > allocators exlude giga pages explicitly. > > While we are at it let's do some renaming. The current scheme is not > consistent and overly painfull to read and understand. Get rid of prefix > underscores from most functions. There is no real reason to make names > longer. > * alloc_fresh_huge_page is the new layer to abstract underlying > allocator > * __hugetlb_alloc_buddy_huge_page becomes shorter and neater > alloc_buddy_huge_page. > * Former alloc_fresh_huge_page becomes alloc_pool_huge_page because we put > the new page directly to the pool > * alloc_surplus_huge_page can drop the opencoded prep_new_huge_page code > as it uses alloc_fresh_huge_page now > * others lose their excessive prefix underscores to make names shorter Hi Michal - We (Linaro) run the libhugetlbfs test suite continuously against mainline and recently (Feb 1), the 'counters' test started failing on with the following error: root@localhost:~# mount_point="/mnt/hugetlb/" root@localhost:~# echo 200 > /proc/sys/vm/nr_hugepages root@localhost:~# mkdir -p "${mount_point}" root@localhost:~# mount -t hugetlbfs hugetlbfs "${mount_point}" root@localhost:~# export LD_LIBRARY_PATH=/root/libhugetlbfs/libhugetlbfs-2.20/obj64 root@localhost:~# /root/libhugetlbfs/libhugetlbfs-2.20/tests/obj64/counters Starting testcase "/root/libhugetlbfs/libhugetlbfs-2.20/tests/obj64/counters", pid 3319 Base pool size: 0 Clean... FAIL Line 326: Bad HugePages_Total: expected 0, actual 1 Line 326 refers to the test source @ https://github.com/libhugetlbfs/libhugetlbfs/blob/master/tests/counters.c#L326 I bisected the failure to this commit. The problem is seen on multiple architectures (tested x86-64 and arm64). Thanks, Dan > > Reviewed-by: Mike Kravetz > Reviewed-by: Naoya Horiguchi > Signed-off-by: Michal Hocko > --- > mm/hugetlb.c | 78 ++++++++++++++++++++++++++++++++---------------------------- > 1 file changed, 42 insertions(+), 36 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 7dc80cbe8e89..60acd3e93a95 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1378,7 +1378,7 @@ pgoff_t __basepage_index(struct page *page) > return (index << compound_order(page_head)) + compound_idx; > } > > -static struct page *__hugetlb_alloc_buddy_huge_page(struct hstate *h, > +static struct page *alloc_buddy_huge_page(struct hstate *h, > gfp_t gfp_mask, int nid, nodemask_t *nmask) > { > int order = huge_page_order(h); > @@ -1396,34 +1396,49 @@ static struct page *__hugetlb_alloc_buddy_huge_page(struct hstate *h, > return page; > } > > +/* > + * Common helper to allocate a fresh hugetlb page. All specific allocators > + * should use this function to get new hugetlb pages > + */ > +static struct page *alloc_fresh_huge_page(struct hstate *h, > + gfp_t gfp_mask, int nid, nodemask_t *nmask) > +{ > + struct page *page; > + > + if (hstate_is_gigantic(h)) > + page = alloc_gigantic_page(h, gfp_mask, nid, nmask); > + else > + page = alloc_buddy_huge_page(h, gfp_mask, > + nid, nmask); > + if (!page) > + return NULL; > + > + if (hstate_is_gigantic(h)) > + prep_compound_gigantic_page(page, huge_page_order(h)); > + prep_new_huge_page(h, page, page_to_nid(page)); > + > + return page; > +} > + > /* > * Allocates a fresh page to the hugetlb allocator pool in the node interleaved > * manner. > */ > -static int alloc_fresh_huge_page(struct hstate *h, nodemask_t *nodes_allowed) > +static int alloc_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed) > { > struct page *page; > int nr_nodes, node; > gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE; > > for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) { > - if (hstate_is_gigantic(h)) > - page = alloc_gigantic_page(h, gfp_mask, > - node, nodes_allowed); > - else > - page = __hugetlb_alloc_buddy_huge_page(h, gfp_mask, > - node, nodes_allowed); > + page = alloc_fresh_huge_page(h, gfp_mask, node, nodes_allowed); > if (page) > break; > - > } > > if (!page) > return 0; > > - if (hstate_is_gigantic(h)) > - prep_compound_gigantic_page(page, huge_page_order(h)); > - prep_new_huge_page(h, page, page_to_nid(page)); > put_page(page); /* free it into the hugepage allocator */ > > return 1; > @@ -1537,7 +1552,7 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn) > /* > * Allocates a fresh surplus page from the page allocator. > */ > -static struct page *__alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, > +static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, > int nid, nodemask_t *nmask) > { > struct page *page = NULL; > @@ -1550,7 +1565,7 @@ static struct page *__alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, > goto out_unlock; > spin_unlock(&hugetlb_lock); > > - page = __hugetlb_alloc_buddy_huge_page(h, gfp_mask, nid, nmask); > + page = alloc_fresh_huge_page(h, gfp_mask, nid, nmask); > if (!page) > goto out_unlock; > > @@ -1567,16 +1582,8 @@ static struct page *__alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, > put_page(page); > page = NULL; > } else { > - int r_nid; > - > h->surplus_huge_pages++; > - h->nr_huge_pages++; > - INIT_LIST_HEAD(&page->lru); > - r_nid = page_to_nid(page); > - set_compound_page_dtor(page, HUGETLB_PAGE_DTOR); > - set_hugetlb_cgroup(page, NULL); > - h->nr_huge_pages_node[r_nid]++; > - h->surplus_huge_pages_node[r_nid]++; > + h->nr_huge_pages_node[page_to_nid(page)]++; > } > > out_unlock: > @@ -1585,7 +1592,7 @@ static struct page *__alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, > return page; > } > > -static struct page *__alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask, > +static struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask, > int nid, nodemask_t *nmask) > { > struct page *page; > @@ -1593,7 +1600,7 @@ static struct page *__alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask, > if (hstate_is_gigantic(h)) > return NULL; > > - page = __hugetlb_alloc_buddy_huge_page(h, gfp_mask, nid, nmask); > + page = alloc_fresh_huge_page(h, gfp_mask, nid, nmask); > if (!page) > return NULL; > > @@ -1601,7 +1608,6 @@ static struct page *__alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask, > * We do not account these pages as surplus because they are only > * temporary and will be released properly on the last reference > */ > - prep_new_huge_page(h, page, page_to_nid(page)); > SetPageHugeTemporary(page); > > return page; > @@ -1611,7 +1617,7 @@ static struct page *__alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask, > * Use the VMA's mpolicy to allocate a huge page from the buddy. > */ > static > -struct page *__alloc_buddy_huge_page_with_mpol(struct hstate *h, > +struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h, > struct vm_area_struct *vma, unsigned long addr) > { > struct page *page; > @@ -1621,7 +1627,7 @@ struct page *__alloc_buddy_huge_page_with_mpol(struct hstate *h, > nodemask_t *nodemask; > > nid = huge_node(vma, addr, gfp_mask, &mpol, &nodemask); > - page = __alloc_surplus_huge_page(h, gfp_mask, nid, nodemask); > + page = alloc_surplus_huge_page(h, gfp_mask, nid, nodemask); > mpol_cond_put(mpol); > > return page; > @@ -1642,7 +1648,7 @@ struct page *alloc_huge_page_node(struct hstate *h, int nid) > spin_unlock(&hugetlb_lock); > > if (!page) > - page = __alloc_migrate_huge_page(h, gfp_mask, nid, NULL); > + page = alloc_migrate_huge_page(h, gfp_mask, nid, NULL); > > return page; > } > @@ -1665,7 +1671,7 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, > } > spin_unlock(&hugetlb_lock); > > - return __alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask); > + return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask); > } > > /* > @@ -1693,7 +1699,7 @@ static int gather_surplus_pages(struct hstate *h, int delta) > retry: > spin_unlock(&hugetlb_lock); > for (i = 0; i < needed; i++) { > - page = __alloc_surplus_huge_page(h, htlb_alloc_mask(h), > + page = alloc_surplus_huge_page(h, htlb_alloc_mask(h), > NUMA_NO_NODE, NULL); > if (!page) { > alloc_ok = false; > @@ -2030,7 +2036,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, > page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve, gbl_chg); > if (!page) { > spin_unlock(&hugetlb_lock); > - page = __alloc_buddy_huge_page_with_mpol(h, vma, addr); > + page = alloc_buddy_huge_page_with_mpol(h, vma, addr); > if (!page) > goto out_uncharge_cgroup; > if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { > @@ -2170,7 +2176,7 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) > if (hstate_is_gigantic(h)) { > if (!alloc_bootmem_huge_page(h)) > break; > - } else if (!alloc_fresh_huge_page(h, > + } else if (!alloc_pool_huge_page(h, > &node_states[N_MEMORY])) > break; > cond_resched(); > @@ -2290,7 +2296,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count, > * First take pages out of surplus state. Then make up the > * remaining difference by allocating fresh huge pages. > * > - * We might race with __alloc_surplus_huge_page() here and be unable > + * We might race with alloc_surplus_huge_page() here and be unable > * to convert a surplus huge page to a normal huge page. That is > * not critical, though, it just means the overall size of the > * pool might be one hugepage larger than it needs to be, but > @@ -2313,7 +2319,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count, > /* yield cpu to avoid soft lockup */ > cond_resched(); > > - ret = alloc_fresh_huge_page(h, nodes_allowed); > + ret = alloc_pool_huge_page(h, nodes_allowed); > spin_lock(&hugetlb_lock); > if (!ret) > goto out; > @@ -2333,7 +2339,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count, > * By placing pages into the surplus state independent of the > * overcommit value, we are allowing the surplus pool size to > * exceed overcommit. There are few sane options here. Since > - * __alloc_surplus_huge_page() is checking the global counter, > + * alloc_surplus_huge_page() is checking the global counter, > * though, we'll note that we're not allowed to exceed surplus > * and won't grow the pool anywhere else. Not until one of the > * sysctls are changed, or the surplus pages go out of use. > -- > 2.15.1 >