Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932968Ab3GWNhJ (ORCPT ); Tue, 23 Jul 2013 09:37:09 -0400 Received: from cantor2.suse.de ([195.135.220.15]:40650 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754250Ab3GWNhH (ORCPT ); Tue, 23 Jul 2013 09:37:07 -0400 Date: Tue, 23 Jul 2013 15:37:04 +0200 From: Michal Hocko To: Joonsoo Kim Cc: Andrew Morton , Rik van Riel , Mel Gorman , "Aneesh Kumar K.V" , KAMEZAWA Hiroyuki , Hugh Dickins , Davidlohr Bueso , David Gibson , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim Subject: Re: [PATCH v2 10/10] mm, hugetlb: decrement reserve count if VM_NORESERVE alloc page cache Message-ID: <20130723133704.GD8677@dhcp22.suse.cz> References: <1374482191-3500-1-git-send-email-iamjoonsoo.kim@lge.com> <1374482191-3500-11-git-send-email-iamjoonsoo.kim@lge.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1374482191-3500-11-git-send-email-iamjoonsoo.kim@lge.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4360 Lines: 119 On Mon 22-07-13 17:36:31, Joonsoo Kim wrote: > If a vma with VM_NORESERVE allocate a new page for page cache, we should > check whether this area is reserved or not. If this address is > already reserved by other process(in case of chg == 0), we should > decrement reserve count, because this allocated page will go into page > cache and currently, there is no way to know that this page comes from > reserved pool or not when releasing inode. This may introduce > over-counting problem to reserved count. With following example code, > you can easily reproduce this situation. > > size = 20 * MB; > flag = MAP_SHARED; > p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0); > if (p == MAP_FAILED) { > fprintf(stderr, "mmap() failed: %s\n", strerror(errno)); > return -1; > } > > flag = MAP_SHARED | MAP_NORESERVE; > q = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0); > if (q == MAP_FAILED) { > fprintf(stderr, "mmap() failed: %s\n", strerror(errno)); > } > q[0] = 'c'; > > This patch solve this problem. Again, please describe _how_ it solves the problem. > Reviewed-by: Wanpeng Li > Reviewed-by: Aneesh Kumar K.V > Signed-off-by: Joonsoo Kim > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 2ea6afd..6782b41 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -443,10 +443,23 @@ void reset_vma_resv_huge_pages(struct vm_area_struct *vma) > } > > /* Returns true if the VMA has associated reserve pages */ > -static int vma_has_reserves(struct vm_area_struct *vma) > +static int vma_has_reserves(struct vm_area_struct *vma, long chg) > { > - if (vma->vm_flags & VM_NORESERVE) > - return 0; > + if (vma->vm_flags & VM_NORESERVE) { > + /* > + * This address is already reserved by other process(chg == 0), > + * so, we should decreament reserved count. Without > + * decreamenting, reserve count is remained after releasing > + * inode, because this allocated page will go into page cache > + * and is regarded as coming from reserved pool in releasing > + * step. Currently, we don't have any other solution to deal > + * with this situation properly, so add work-around here. > + */ > + if (vma->vm_flags & VM_MAYSHARE && chg == 0) > + return 1; > + else > + return 0; > + } > > /* Shared mappings always use reserves */ > if (vma->vm_flags & VM_MAYSHARE) > @@ -520,7 +533,8 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid) > > static struct page *dequeue_huge_page_vma(struct hstate *h, > struct vm_area_struct *vma, > - unsigned long address, int avoid_reserve) > + unsigned long address, int avoid_reserve, > + long chg) > { > struct page *page = NULL; > struct mempolicy *mpol; > @@ -535,7 +549,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h, > * have no page reserves. This check ensures that reservations are > * not "stolen". The child may still get SIGKILLed > */ > - if (!vma_has_reserves(vma) && > + if (!vma_has_reserves(vma, chg) && > h->free_huge_pages - h->resv_huge_pages == 0) > return NULL; > > @@ -553,8 +567,12 @@ retry_cpuset: > if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask)) { > page = dequeue_huge_page_node(h, zone_to_nid(zone)); > if (page) { > - if (!avoid_reserve && vma_has_reserves(vma)) > - h->resv_huge_pages--; > + if (avoid_reserve) > + break; > + if (!vma_has_reserves(vma, chg)) > + break; > + > + h->resv_huge_pages--; > break; > } > } > @@ -1135,7 +1153,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, > return ERR_PTR(-ENOSPC); > } > spin_lock(&hugetlb_lock); > - page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve); > + page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve, chg); > if (!page) { > spin_unlock(&hugetlb_lock); > page = alloc_buddy_huge_page(h, NUMA_NO_NODE); > -- > 1.7.9.5 > -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/