Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756042AbbFOSnN (ORCPT ); Mon, 15 Jun 2015 14:43:13 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:28394 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755810AbbFOSnK (ORCPT ); Mon, 15 Jun 2015 14:43:10 -0400 Message-ID: <557F1C91.9080904@oracle.com> Date: Mon, 15 Jun 2015 11:42:25 -0700 From: Mike Kravetz User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Naoya Horiguchi CC: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Dave Hansen , David Rientjes , Hugh Dickins , Davidlohr Bueso , Aneesh Kumar , Hillf Danton , Christoph Hellwig Subject: Re: [RFC v4 PATCH 6/9] mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate References: <1434056500-2434-1-git-send-email-mike.kravetz@oracle.com> <1434056500-2434-7-git-send-email-mike.kravetz@oracle.com> <20150615063444.GA26050@hori1.linux.bs1.fc.nec.co.jp> In-Reply-To: <20150615063444.GA26050@hori1.linux.bs1.fc.nec.co.jp> Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-Source-IP: userv0022.oracle.com [156.151.31.74] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5115 Lines: 134 On 06/14/2015 11:34 PM, Naoya Horiguchi wrote: > On Thu, Jun 11, 2015 at 02:01:37PM -0700, Mike Kravetz wrote: >> Areas hole punched by fallocate will not have entries in the >> region/reserve map. However, shared mappings with min_size subpool >> reservations may still have reserved pages. alloc_huge_page needs >> to handle this special case and do the proper accounting. >> >> Signed-off-by: Mike Kravetz >> --- >> mm/hugetlb.c | 48 +++++++++++++++++++++++++++--------------------- >> 1 file changed, 27 insertions(+), 21 deletions(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index ecbaffe..9c295c9 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -692,19 +692,9 @@ static int vma_has_reserves(struct vm_area_struct *vma, long chg) >> return 0; >> } >> >> - if (vma->vm_flags & VM_MAYSHARE) { >> - /* >> - * We know VM_NORESERVE is not set. Therefore, there SHOULD >> - * be a region map for all pages. The only situation where >> - * there is no region map is if a hole was punched via >> - * fallocate. In this case, there really are no reverves to >> - * use. This situation is indicated if chg != 0. >> - */ >> - if (chg) >> - return 0; >> - else >> - return 1; >> - } >> + /* Shared mappings always use reserves */ >> + if (vma->vm_flags & VM_MAYSHARE) >> + return 1; > > This change completely reverts 5/9, so can you omit 5/9? That was a mistake. This change should not be in the patch. The change from 5/9 needs to remain. Sorry for confusion. Thanks for catching. >> /* >> * Only the process that called mmap() has reserves for >> @@ -1601,6 +1591,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, >> struct hstate *h = hstate_vma(vma); >> struct page *page; >> long chg, commit; >> + long gbl_chg; >> int ret, idx; >> struct hugetlb_cgroup *h_cg; >> >> @@ -1608,24 +1599,39 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, >> /* >> * Processes that did not create the mapping will have no >> * reserves and will not have accounted against subpool >> - * limit. Check that the subpool limit can be made before >> - * satisfying the allocation MAP_NORESERVE mappings may also >> - * need pages and subpool limit allocated allocated if no reserve >> - * mapping overlaps. >> + * limit. Check that the subpool limit will not be exceeded >> + * before performing the allocation. Allocations for >> + * MAP_NORESERVE mappings also need to be checked against >> + * any subpool limit. >> + * >> + * NOTE: Shared mappings with holes punched via fallocate >> + * may still have reservations, even without entries in the >> + * reserve map as indicated by vma_needs_reservation. This >> + * would be the case if hugepage_subpool_get_pages returns >> + * zero to indicate no changes to the global reservation count >> + * are necessary. In this case, pass the output of >> + * hugepage_subpool_get_pages (zero) to dequeue_huge_page_vma >> + * so that the page is not counted against the global limit. >> + * For MAP_NORESERVE mappings always pass the output of >> + * vma_needs_reservation. For race detection and error cleanup >> + * use output of vma_needs_reservation as well. >> */ >> - chg = vma_needs_reservation(h, vma, addr); >> + chg = gbl_chg = vma_needs_reservation(h, vma, addr); >> if (chg < 0) >> return ERR_PTR(-ENOMEM); >> - if (chg || avoid_reserve) >> - if (hugepage_subpool_get_pages(spool, 1) < 0) >> + if (chg || avoid_reserve) { >> + gbl_chg = hugepage_subpool_get_pages(spool, 1); >> + if (gbl_chg < 0) >> return ERR_PTR(-ENOSPC); >> + } >> >> ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg); >> if (ret) >> goto out_subpool_put; >> >> spin_lock(&hugetlb_lock); >> - page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve, chg); >> + page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve, >> + avoid_reserve ? chg : gbl_chg); > > You use chg or gbl_chg depending on avoid_reserve here, and below this line > there's code like below > > commit = vma_commit_reservation(h, vma, addr); > if (unlikely(chg > commit)) { > ... > } > > This also need to be changed to use chg or gbl_chg depending on avoid_reserve? It should use chg only. I attempted to address this at the end of the Note above. " For race detection and error cleanup use output of vma_needs_reservation as well." I will add more comments to make it clear. > # I feel that this reserve-handling code in alloc_huge_page() is too complicated > # and hard to understand, so some cleanup like separating reserve parts into > # other new routine(s) might be helpful... I agree, let me think about ways to split this up and hopefully make it easier to understand. -- Mike Kravetz > > Thanks, > Naoya Horiguchi > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/