Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751252AbdH0RPm (ORCPT ); Sun, 27 Aug 2017 13:15:42 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:21047 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751146AbdH0RPl (ORCPT ); Sun, 27 Aug 2017 13:15:41 -0400 Subject: Re: [PATCH] hugetlbfs: change put_page/unlock_page order in hugetlbfs_fallocate() To: Nadav Amit , Nadia Yvette Chambers Cc: linux-kernel@vger.kernel.org, Nadav Amit , Eric Biggers References: <20170826210905.GA21712@zzz.localdomain> <20170826191124.51642-1-namit@vmware.com> From: Mike Kravetz Message-ID: <6bf36198-0693-5735-7180-6529aa4c29e4@oracle.com> Date: Sun, 27 Aug 2017 10:15:36 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20170826191124.51642-1-namit@vmware.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Source-IP: userv0022.oracle.com [156.151.31.74] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2519 Lines: 65 On 08/26/2017 12:11 PM, Nadav Amit wrote: > hugetlfs_fallocate() currently performs put_page() before unlock_page(). > This scenario opens a small time window, from the time the page is added > to the page cache, until it is unlocked, in which the page might be > removed from the page-cache by another core. If the page is removed > during this time windows, it might cause a memory corruption, as the > wrong page will be unlocked. > > It is arguable whether this scenario can happen in a real system, and > there are several mitigating factors. The issue was found by code > inspection (actually grep), and not by actually triggering the flow. > Yet, since putting the page before unlocking is incorrect it should be > fixed, if only to prevent future breakage or someone copy-pasting this > code. > > Fixes: 70c3547e36f5c ("hugetlbfs: add hugetlbfs_fallocate()") > > cc: Eric Biggers > cc: Mike Kravetz > > Signed-off-by: Nadav Amit Thank you Nadav. Reviewed-by: Mike Kravetz Since hugetlbfs is an in memory filesystem, the only way one 'should' be able to remove a page (file content) is through an inode operation such as truncate, hole punch, or unlink. That was the basis for my response that the inode lock would be required for page freeing. Eric's question about sys_fadvise64(POSIX_FADV_DONTNEED) is interesting. I was expecting to see a check for hugetlbfs pages and exit (without modification) if encountered. A quick review of the code did not find any such checks. I'll take a closer look to determine exactly how hugetlbfs files are handled. IMO, there should be something similar to the DAX check where the routine quickly exits. -- Mike Kravetz > --- > fs/hugetlbfs/inode.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > index 28d2753be094..9475fee79cee 100644 > --- a/fs/hugetlbfs/inode.c > +++ b/fs/hugetlbfs/inode.c > @@ -655,11 +655,11 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, > mutex_unlock(&hugetlb_fault_mutex_table[hash]); > > /* > - * page_put due to reference from alloc_huge_page() > * unlock_page because locked by add_to_page_cache() > + * page_put due to reference from alloc_huge_page() > */ > - put_page(page); > unlock_page(page); > + put_page(page); > } > > if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size) >