Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751705AbbKIG7A (ORCPT ); Mon, 9 Nov 2015 01:59:00 -0500 Received: from TYO201.gate.nec.co.jp ([210.143.35.51]:60224 "EHLO tyo201.gate.nec.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750814AbbKIG67 convert rfc822-to-8bit (ORCPT ); Mon, 9 Nov 2015 01:58:59 -0500 From: Naoya Horiguchi To: Mike Kravetz CC: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Hugh Dickins , "Andrew Morton" , Dave Hansen , Davidlohr Bueso Subject: Re: [PATCH] mm/hugetlbfs Fix bugs in fallocate hole punch of areas with holes Thread-Topic: [PATCH] mm/hugetlbfs Fix bugs in fallocate hole punch of areas with holes Thread-Index: AQHRE2s57zhY330tRE+FAp79eXk1v56Sue2A Date: Mon, 9 Nov 2015 06:57:05 +0000 Message-ID: <20151109065655.GA12428@hori1.linux.bs1.fc.nec.co.jp> References: <1446247932-11348-1-git-send-email-mike.kravetz@oracle.com> In-Reply-To: <1446247932-11348-1-git-send-email-mike.kravetz@oracle.com> Accept-Language: ja-JP, en-US Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.128.101.11] Content-Type: text/plain; charset="iso-2022-jp" Content-ID: Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4475 Lines: 126 On Fri, Oct 30, 2015 at 04:32:12PM -0700, Mike Kravetz wrote: > Hugh Dickins pointed out problems with the new hugetlbfs fallocate > hole punch code. These problems are in the routine remove_inode_hugepages > and mostly occur in the case where there are holes in the range of > pages to be removed. These holes could be the result of a previous hole > punch or simply sparse allocation. > > remove_inode_hugepages handles both hole punch and truncate operations. > Page index handling was fixed/cleaned up so that holes are properly > handled. In addition, code was changed to ensure multiple passes of the > address range only happens in the truncate case. More comments were added > to explain the different actions in each case. A cond_resched() was added > after removing up to PAGEVEC_SIZE pages. > > Some totally unnecessary code in hugetlbfs_fallocate() that remained from > early development was also removed. > > Signed-off-by: Mike Kravetz > --- > fs/hugetlbfs/inode.c | 44 +++++++++++++++++++++++++++++--------------- > 1 file changed, 29 insertions(+), 15 deletions(-) > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > index 316adb9..30cf534 100644 > --- a/fs/hugetlbfs/inode.c > +++ b/fs/hugetlbfs/inode.c > @@ -368,10 +368,25 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, > lookup_nr = end - next; > > /* > - * This pagevec_lookup() may return pages past 'end', > - * so we must check for page->index > end. > + * When no more pages are found, take different action for > + * hole punch and truncate. > + * > + * For hole punch, this indicates we have removed each page > + * within the range and are done. Note that pages may have > + * been faulted in after being removed in the hole punch case. > + * This is OK as long as each page in the range was removed > + * once. > + * > + * For truncate, we need to make sure all pages within the > + * range are removed when exiting this routine. We could > + * have raced with a fault that brought in a page after it > + * was first removed. Check the range again until no pages > + * are found. > */ > if (!pagevec_lookup(&pvec, mapping, next, lookup_nr)) { > + if (!truncate_op) > + break; > + > if (next == start) > break; > next = start; > @@ -382,19 +397,23 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, > struct page *page = pvec.pages[i]; > u32 hash; > > + /* > + * The page (index) could be beyond end. This is > + * only possible in the punch hole case as end is > + * LLONG_MAX for truncate. > + */ > + if (page->index >= end) { > + next = end; /* we are done */ > + break; > + } > + next = page->index; > + > hash = hugetlb_fault_mutex_hash(h, current->mm, > &pseudo_vma, > mapping, next, 0); > mutex_lock(&hugetlb_fault_mutex_table[hash]); > > lock_page(page); > - if (page->index >= end) { > - unlock_page(page); > - mutex_unlock(&hugetlb_fault_mutex_table[hash]); > - next = end; /* we are done */ > - break; > - } > - > /* > * If page is mapped, it was faulted in after being > * unmapped. Do nothing in this race case. In the > @@ -423,15 +442,13 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, > } > } > > - if (page->index > next) > - next = page->index; > - > ++next; You set next = page->index above, so this increment takes effect only in the final iteration. Can we put this outside (just after) this for-loop? Thanks, Naoya Horiguchi > unlock_page(page); > > mutex_unlock(&hugetlb_fault_mutex_table[hash]); > } > huge_pagevec_release(&pvec); > + cond_resched(); > } > > if (truncate_op) > @@ -647,9 +664,6 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, > if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size) > i_size_write(inode, offset + len); > inode->i_ctime = CURRENT_TIME; > - spin_lock(&inode->i_lock); > - inode->i_private = NULL; > - spin_unlock(&inode->i_lock); > out: > mutex_unlock(&inode->i_mutex); > return error; > -- > 2.4.3 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/