Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp780778pxk; Fri, 11 Sep 2020 23:20:47 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxSqsY/iw1fpiJ8SZfRlAhKLbU37n310aTtpZQDJuw+AsqUpezgpMsqPBZ6sTKAaEq8kEGB X-Received: by 2002:a17:906:8289:: with SMTP id h9mr4997225ejx.45.1599891647379; Fri, 11 Sep 2020 23:20:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1599891647; cv=none; d=google.com; s=arc-20160816; b=nMnBOtKIjyY7E9xDvM2hxMxjAbx88cJItMYqH+qH9V8LvAGyYI7omC7B38t3YpGM/N BE+r02fDc4e1NTwzHEFXiK9X/db6LUackNu8g5fjVdwtYzSh3Jjy7ZE1OmBjdmNHZd4r a8A0QDv5LnAydMAxutEleSy23RV2pq3LuVtzSy/8jpdiq9xsB0pAkpiuJ9PnHyXg3lLl 013MAjS5fnyb0YQyaLQA6qIevJUnApOVmmtWtFqTRRY+hkNlN1FWz62BYU1uxCnGtNBP 8x2dMSBnA16gJsxA1d4MuXhuW2Y1C8usid+FP/qSbPvFT+8ZCVXnaRcPCP9IVslbK2tJ kWrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=RWUUdByFvtqb1glreh6Bs2mDpnkxfWc+Wt1JUPVGjnk=; b=M3CtZR8YEsk4dQpUW7bNB5fWQ0zGTJFkZHBcYl1NcQk6lKZzqGz+Nn/0aSXKvLYHFm C8amThjaR4U3rqsrMgBmkX2TxIxWdr0JPtIir0xsPEbK/7I9LfNBM4kqRJJ8UtXj32Mn J+S4SLcCVWY2sNFOD2TXu6hb4EVeA+ieb/3llT1K+M2FSBpqt5bqexsPJHzmGVo6lvkV vzl7tSyc5fMcRTgdPjiXK4GMX7Qfn1/MS1uDRvKWZvnRGvC/MkiYRzVXwdiqDuBO+U1t Pm0onLARlB7La9rM7gJsx1yfi7DB4TESziA/Y1odrDNySJDtWPojuKVOJBLvN+SG1HHa W8Uw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=oeRCKcxX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y4si2826657edt.161.2020.09.11.23.20.24; Fri, 11 Sep 2020 23:20:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=oeRCKcxX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725844AbgILGT1 (ORCPT + 99 others); Sat, 12 Sep 2020 02:19:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725808AbgILGTZ (ORCPT ); Sat, 12 Sep 2020 02:19:25 -0400 Received: from mail-io1-xd43.google.com (mail-io1-xd43.google.com [IPv6:2607:f8b0:4864:20::d43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 77D41C061573; Fri, 11 Sep 2020 23:19:25 -0700 (PDT) Received: by mail-io1-xd43.google.com with SMTP id b6so13344051iof.6; Fri, 11 Sep 2020 23:19:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=RWUUdByFvtqb1glreh6Bs2mDpnkxfWc+Wt1JUPVGjnk=; b=oeRCKcxXBk+ojLrRlwIdT3VESa294A5UXZF4bv+munDBcqKQw8iEd1t+/7K8ouFph0 rQyy8ekphcTE3vjeYDn7BG5mWubXgm4IT4aNf3a4x9s5EVh9BStZPNw1DBTgFRKAgVTj vCedXQa5w8KuAnyx6OSAjFDtfDfYEWhdpPhx923FHm29ohE1utAiJAZQ/1nZ9b917Vem wqIoU3oP7v7MinNN50b0n0FNseDccrMs1IM8p70+mOkKsJ3/pRWsfbUDp4HQqoSoCKDo bvQeDXU5Q4Ls5jDabFolMZQG+2pOFLqQ1IdB1bl/mBMRHOFJPfF4aSycwsSSody+CKp/ wdew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=RWUUdByFvtqb1glreh6Bs2mDpnkxfWc+Wt1JUPVGjnk=; b=EWwvhLZ8AVJVxiDYUsYMyq1F+genR91EcWPAs56l6jN1hU0k32VjfT00VRePUAJE6Q 2xvW1T3birg1iVKhUSXFgvpgpckwuIz0JQ1KC/3HgTPpswG1TOQwIo0whwqmXTAkcIa5 UYVP+Z60KZ66K9y1xrGt42bL+1HIW8cEMGyh4ylnChopo1bmq+lffK2THdXKSq1gkkYF lpKTiGYlHUvIP+866+10XrCyDH7kTakbx620f0sdhGTzrLrOvnZGGF951TPzome9BczK EXAWR/dzJ9Vt9IC1KuTfx1rgx51D/+LkxpQJnp+wC3+wTMiql8m6Nafapp7fFmuN6UlP c+tw== X-Gm-Message-State: AOAM530dm69ShM5ShCUUMN8k+Lu9AZ9sXcaFTaK9JJrLxvQOYMuFwRNX JbW0tu6+6AJ0blqQITipNmcdJMkdrVuIm54V260= X-Received: by 2002:a05:6602:2e81:: with SMTP id m1mr4459738iow.64.1599891562643; Fri, 11 Sep 2020 23:19:22 -0700 (PDT) MIME-Version: 1.0 References: <20200623052059.1893966-1-david@fromorbit.com> In-Reply-To: <20200623052059.1893966-1-david@fromorbit.com> From: Amir Goldstein Date: Sat, 12 Sep 2020 09:19:11 +0300 Message-ID: Subject: More filesystem need this fix (xfs: use MMAPLOCK around filemap_map_pages()) To: Andreas Gruenbacher , Jan Kara , Theodore Tso , Martin Brandenburg , Mike Marshall , Damien Le Moal , Jaegeuk Kim , Qiuyang Sun Cc: linux-xfs , Dave Chinner , linux-fsdevel , Linux MM , linux-kernel , Matthew Wilcox , Linus Torvalds , "Kirill A. Shutemov" , Andrew Morton , Al Viro Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 23, 2020 at 8:21 AM Dave Chinner wrote: > > From: Dave Chinner > > The page faultround path ->map_pages is implemented in XFS via > filemap_map_pages(). This function checks that pages found in page > cache lookups have not raced with truncate based invalidation by > checking page->mapping is correct and page->index is within EOF. > > However, we've known for a long time that this is not sufficient to > protect against races with invalidations done by operations that do > not change EOF. e.g. hole punching and other fallocate() based > direct extent manipulations. The way we protect against these > races is we wrap the page fault operations in a XFS_MMAPLOCK_SHARED > lock so they serialise against fallocate and truncate before calling > into the filemap function that processes the fault. > > Do the same for XFS's ->map_pages implementation to close this > potential data corruption issue. > > Signed-off-by: Dave Chinner > --- > fs/xfs/xfs_file.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index 7b05f8fd7b3d..4b185a907432 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -1266,10 +1266,23 @@ xfs_filemap_pfn_mkwrite( > return __xfs_filemap_fault(vmf, PE_SIZE_PTE, true); > } > > +static void > +xfs_filemap_map_pages( > + struct vm_fault *vmf, > + pgoff_t start_pgoff, > + pgoff_t end_pgoff) > +{ > + struct inode *inode = file_inode(vmf->vma->vm_file); > + > + xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > + filemap_map_pages(vmf, start_pgoff, end_pgoff); > + xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > +} > + > static const struct vm_operations_struct xfs_file_vm_ops = { > .fault = xfs_filemap_fault, > .huge_fault = xfs_filemap_huge_fault, > - .map_pages = filemap_map_pages, > + .map_pages = xfs_filemap_map_pages, > .page_mkwrite = xfs_filemap_page_mkwrite, > .pfn_mkwrite = xfs_filemap_pfn_mkwrite, > }; > -- > 2.26.2.761.g0e0b3e54be > It appears that ext4, f2fs, gfs2, orangefs, zonefs also need this fix zonefs does not support hole punching, so it may not need to use mmap_sem at all. It is interesting to look at how this bug came to be duplicated in so many filesystems, because there are lessons to be learned. Commit f1820361f83d ("mm: implement ->map_pages for page cache") added to ->map_pages() operation and its commit message said: "...It should be safe to use filemap_map_pages() for ->map_pages() if filesystem use filemap_fault() for ->fault()." At the time, all of the aforementioned filesystems used filemap_fault() for ->fault(). But since then, ext4, xfs, f2fs and just recently gfs2 have added a filesystem ->fault() operation. orangefs has added vm_operations since and zonefs was added since, probably copying the mmap_sem handling from ext4. Both have a filesystem ->fault() operation. It was surprising for me to see that some of the filesystem developers signed on the added ->fault() operations are not strangers to mm. The recent gfs2 change was even reviewed by an established mm developer [1]. So what can we learn from this case study? How could we fix the interface to avoid repeating the same mistake in the future? Thanks, Amir. [1] https://lore.kernel.org/linux-fsdevel/20200703113801.GD25523@casper.infradead.org/