Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp674513pxk; Wed, 16 Sep 2020 14:05:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw7MjTdkmZcr1CB2AYck0XZkSe520NvrNPKS1RFdJXtI5ifr1k58BYC3yno8HxruX880VsG X-Received: by 2002:a17:906:68d2:: with SMTP id y18mr27961272ejr.197.1600290354220; Wed, 16 Sep 2020 14:05:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600290354; cv=none; d=google.com; s=arc-20160816; b=oVSsNrsyLit1F15aOHEp/Hx7QGLoyq55bXAkpTXiy0J48ND54hNst2y4S6MnYUmG18 dxxN6N+E7npWONAUMJ+o/rjb+DbOOagsBbbHO58VOoacH51NWw2AM2U6qmuaO3EPNEzA RZDaeIDKCXow5kAhIXe/0aZnQrcpCl89xPI4m9GcLFY4VFw7al3ExKBprlzh00rN3v1s nzfg2Ww9ycQDyyRG7O/SzqOmHaaPcXVPKN527LJnXHaHqw00pyX/OpKKjZ2te7YW+UnF Xhj5x3bsLgntQhMjQKjbfTAe0JAYFCwbQCxL6mVMbUpf+E73L0jw7saQjW5ainj/MMOR gb0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=Pokm/YbcXcRAi8VN/e/MHP0iHquB2AqasQtGNnRqfCk=; b=ZfZJGM/yrw6sHdSs2ScZr7wbibKHUU56cRJBjCzMbk7rqyILoYEd9iotb8FwTLVC9f bTeUeT9TlQ5Gbk6iMz1iZqyyxZ8miU8WmEMIjFhTrUDJfYH/qyuwl71WEPLmGBZBFl90 m9x+AroM+3aIAKTFk65U8ZNbOn4O/lfUv90xrFdaOb8CEh/RRNxpKxue4GPGENz9vglJ I2NqgOa/ZFRgLjI868DJhr/r4KNq/jEpcafv1LabhACdbWptTI9JCk2JzoeWBE3/8DlA A5jthcdzRy/YP5B9q8hFh4DGNEFfTENoWOsffpDu9bzQunCr46zqiJwTByy7QYhv22Xi /dEg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id mc3si13078408ejb.404.2020.09.16.14.05.07; Wed, 16 Sep 2020 14:05:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728519AbgIPVEB (ORCPT + 99 others); Wed, 16 Sep 2020 17:04:01 -0400 Received: from mx2.suse.de ([195.135.220.15]:60120 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726487AbgIPQPB (ORCPT ); Wed, 16 Sep 2020 12:15:01 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 6A9E7AF72; Wed, 16 Sep 2020 15:59:07 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 402041E12E1; Wed, 16 Sep 2020 17:58:51 +0200 (CEST) Date: Wed, 16 Sep 2020 17:58:51 +0200 From: Jan Kara To: Amir Goldstein Cc: Andreas Gruenbacher , Jan Kara , Theodore Tso , Martin Brandenburg , Mike Marshall , Damien Le Moal , Jaegeuk Kim , Qiuyang Sun , linux-xfs , Dave Chinner , linux-fsdevel , Linux MM , linux-kernel , Matthew Wilcox , Linus Torvalds , "Kirill A. Shutemov" , Andrew Morton , Al Viro , nborisov@suse.de Subject: Re: More filesystem need this fix (xfs: use MMAPLOCK around filemap_map_pages()) Message-ID: <20200916155851.GA1572@quack2.suse.cz> References: <20200623052059.1893966-1-david@fromorbit.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat 12-09-20 09:19:11, Amir Goldstein wrote: > On Tue, Jun 23, 2020 at 8:21 AM Dave Chinner wrote: > > > > From: Dave Chinner > > > > The page faultround path ->map_pages is implemented in XFS via > > filemap_map_pages(). This function checks that pages found in page > > cache lookups have not raced with truncate based invalidation by > > checking page->mapping is correct and page->index is within EOF. > > > > However, we've known for a long time that this is not sufficient to > > protect against races with invalidations done by operations that do > > not change EOF. e.g. hole punching and other fallocate() based > > direct extent manipulations. The way we protect against these > > races is we wrap the page fault operations in a XFS_MMAPLOCK_SHARED > > lock so they serialise against fallocate and truncate before calling > > into the filemap function that processes the fault. > > > > Do the same for XFS's ->map_pages implementation to close this > > potential data corruption issue. > > > > Signed-off-by: Dave Chinner > > --- > > fs/xfs/xfs_file.c | 15 ++++++++++++++- > > 1 file changed, 14 insertions(+), 1 deletion(-) > > > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > > index 7b05f8fd7b3d..4b185a907432 100644 > > --- a/fs/xfs/xfs_file.c > > +++ b/fs/xfs/xfs_file.c > > @@ -1266,10 +1266,23 @@ xfs_filemap_pfn_mkwrite( > > return __xfs_filemap_fault(vmf, PE_SIZE_PTE, true); > > } > > > > +static void > > +xfs_filemap_map_pages( > > + struct vm_fault *vmf, > > + pgoff_t start_pgoff, > > + pgoff_t end_pgoff) > > +{ > > + struct inode *inode = file_inode(vmf->vma->vm_file); > > + > > + xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > > + filemap_map_pages(vmf, start_pgoff, end_pgoff); > > + xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > > +} > > + > > static const struct vm_operations_struct xfs_file_vm_ops = { > > .fault = xfs_filemap_fault, > > .huge_fault = xfs_filemap_huge_fault, > > - .map_pages = filemap_map_pages, > > + .map_pages = xfs_filemap_map_pages, > > .page_mkwrite = xfs_filemap_page_mkwrite, > > .pfn_mkwrite = xfs_filemap_pfn_mkwrite, > > }; > > -- > > 2.26.2.761.g0e0b3e54be > > > > It appears that ext4, f2fs, gfs2, orangefs, zonefs also need this fix > > zonefs does not support hole punching, so it may not need to use > mmap_sem at all. So I've written an ext4 fix for this but before actually posting it Nikolay working on btrfs fix asked why exactly is filemap_map_pages() actually a problem and I think he's right it actually isn't a problem. The thing is: filemap_map_pages() never does any page mapping or IO. It only creates PTEs for uptodate pages that are already in page cache. As such it is a rather different beast compared to the fault handler from fs POV and does not need protection from hole punching (current serialization on page lock and checking of page->mapping is enough). That being said I agree this is subtle and the moment someone adds e.g. a readahead call into filemap_map_pages() we have a real problem. I'm not sure how to prevent this risk... Honza -- Jan Kara SUSE Labs, CR