Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1946216AbbGQJEU (ORCPT ); Fri, 17 Jul 2015 05:04:20 -0400 Received: from TYO202.gate.nec.co.jp ([210.143.35.52]:63527 "EHLO tyo202.gate.nec.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1422637AbbGQJEQ convert rfc822-to-8bit (ORCPT ); Fri, 17 Jul 2015 05:04:16 -0400 From: Naoya Horiguchi To: Mike Kravetz CC: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "linux-api@vger.kernel.org" , Dave Hansen , "David Rientjes" , Hugh Dickins , "Davidlohr Bueso" , Aneesh Kumar , Hillf Danton , Christoph Hellwig , Andrew Morton , Michal Hocko Subject: Re: [PATCH v3 01/10] mm/hugetlb: add cache of descriptors to resv_map for region_add Thread-Topic: [PATCH v3 01/10] mm/hugetlb: add cache of descriptors to resv_map for region_add Thread-Index: AQHQvSNnIgScQR2NukuUNK/TwtEgrZ3ezWUA Date: Fri, 17 Jul 2015 09:02:14 +0000 Message-ID: <20150717090213.GB32135@hori1.linux.bs1.fc.nec.co.jp> References: <1436761268-6397-1-git-send-email-mike.kravetz@oracle.com> <1436761268-6397-2-git-send-email-mike.kravetz@oracle.com> In-Reply-To: <1436761268-6397-2-git-send-email-mike.kravetz@oracle.com> Accept-Language: ja-JP, en-US Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.128.101.21] Content-Type: text/plain; charset="iso-2022-jp" Content-ID: Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5303 Lines: 133 On Sun, Jul 12, 2015 at 09:20:59PM -0700, Mike Kravetz wrote: > fallocate hole punch will want to remove a specific range of > pages. When pages are removed, their associated entries in > the region/reserve map will also be removed. This will break > an assumption in the region_chg/region_add calling sequence. > If a new region descriptor must be allocated, it is done as > part of the region_chg processing. In this way, region_add > can not fail because it does not need to attempt an allocation. > > To prepare for fallocate hole punch, create a "cache" of > descriptors that can be used by region_add if necessary. > region_chg will ensure there are sufficient entries in the > cache. It will be necessary to track the number of in progress > add operations to know a sufficient number of descriptors > reside in the cache. A new routine region_abort is added to > adjust this in progress count when add operations are aborted. > vma_abort_reservation is also added for callers creating > reservations with vma_needs_reservation/vma_commit_reservation. > > Signed-off-by: Mike Kravetz > --- > include/linux/hugetlb.h | 3 + > mm/hugetlb.c | 169 ++++++++++++++++++++++++++++++++++++++++++------ > 2 files changed, 153 insertions(+), 19 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index d891f94..667cf44 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -35,6 +35,9 @@ struct resv_map { > struct kref refs; > spinlock_t lock; > struct list_head regions; > + long adds_in_progress; > + struct list_head rgn_cache; > + long rgn_cache_count; > }; > extern struct resv_map *resv_map_alloc(void); > void resv_map_release(struct kref *ref); > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index a8c3087..241d16d 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -240,11 +240,14 @@ struct file_region { > > /* > * Add the huge page range represented by [f, t) to the reserve > - * map. Existing regions will be expanded to accommodate the > - * specified range. We know only existing regions need to be > - * expanded, because region_add is only called after region_chg > - * with the same range. If a new file_region structure must > - * be allocated, it is done in region_chg. > + * map. In the normal case, existing regions will be expanded > + * to accommodate the specified range. Sufficient regions should > + * exist for expansion due to the previous call to region_chg > + * with the same range. However, it is possible that region_del > + * could have been called after region_chg and modifed the map > + * in such a way that no region exists to be expanded. In this > + * case, pull a region descriptor from the cache associated with > + * the map and use that for the new range. > * > * Return the number of new huge pages added to the map. This > * number is greater than or equal to zero. > @@ -261,6 +264,27 @@ static long region_add(struct resv_map *resv, long f, long t) > if (f <= rg->to) > break; > > + if (&rg->link == head || t < rg->from) { > + /* > + * No region exists which can be expanded to include the > + * specified range. Pull a region descriptor from the > + * cache, and use it for this range. > + */ This comment mentions this if-block, not the VM_BUG_ON below, so it had better be put the above if-line. > + VM_BUG_ON(!resv->rgn_cache_count); resv->rgn_cache_count <= 0 might be safer. ... > @@ -3236,11 +3360,14 @@ retry: > * any allocations necessary to record that reservation occur outside > * the spinlock. > */ > - if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) > + if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) { > if (vma_needs_reservation(h, vma, address) < 0) { > ret = VM_FAULT_OOM; > goto backout_unlocked; > } > + /* Just decrements count, does not deallocate */ > + vma_abort_reservation(h, vma, address); > + } This is not "abort reservation" operation, but you use "abort reservation" routine, which might confusing and makes future maintenance hard. I think this should be done in a simplified variant of vma_commit_reservation() (maybe just an alias of your vma_abort_reservation()) or fast path in vma_commit_reservation(). Thanks, Naoya Horiguchi > > ptl = huge_pte_lockptr(h, mm, ptep); > spin_lock(ptl); > @@ -3387,6 +3514,8 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > ret = VM_FAULT_OOM; > goto out_mutex; > } > + /* Just decrements count, does not deallocate */ > + vma_abort_reservation(h, vma, address); > > if (!(vma->vm_flags & VM_MAYSHARE)) > pagecache_page = hugetlbfs_pagecache_page(h, > @@ -3726,6 +3855,8 @@ int hugetlb_reserve_pages(struct inode *inode, > } > return 0; > out_err: > + if (!vma || vma->vm_flags & VM_MAYSHARE) > + region_abort(resv_map, from, to); > if (vma && is_vma_resv_set(vma, HPAGE_RESV_OWNER)) > kref_put(&resv_map->refs, resv_map_release); > return ret; > -- > 2.1.0 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/