Received: by 10.213.65.68 with SMTP id h4csp301446imn; Fri, 30 Mar 2018 21:16:48 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+P9JE7HSzKwHrut28FtQ5qDYMg7isrnCBAQpfXE88e/jqnwK0z0MEskuOIEgiilq2rQoE1 X-Received: by 2002:a17:902:2943:: with SMTP id g61-v6mr1685052plb.238.1522469808170; Fri, 30 Mar 2018 21:16:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522469808; cv=none; d=google.com; s=arc-20160816; b=GkHYDds58JNwKXmet1zVpHc+MWbI6PudpBZtd8FZ8S6pd+IXF4vyrdSo+00qvrxuQN 0Y58nHPu8NyW3vXaVkU99WjoGnMaAyXwOZw6gfJevtDRLxdd8dPqJN4gp2IfJophEWcL OzPLAQ/RAa7vJKeZcuSp2ron/qEkbB4QEM/dnlJr5LaaWJ0Cf9/epK8ecsoV2hl0kOn8 mpoZDgqiRXZSR1cR5CizFFj/29nycxWvdj6Xy0KcMJIdoLPF+RqeMkEQTExDkw/kqELr I9DjKTXW1w9eoHP7tXqGADaORSVg8yTMB1w0kUyn2wGpk2f1F7x7DHh5iCgnLmPLWHCc H+8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:arc-authentication-results; bh=YxvBVeKdjDy+T3U+Yq2dOn+1F8ZAXz44OGvkocZtJ0Q=; b=h+X/k3MC1xjkXCJcDyCrYOBAU+kJyTGoyAOilj5poSLXOg9EN8Gg8XK3ADt9VCKXJ+ znsZXg8OvOlY08wMd3HhZxe8UbPChL/HwCcLtrNiNWUP3aRTVW3Ag2hi8QENGeJwcMdx bRD/1RDEJCBhMgpvmJkdK2SGEru2D/vjxUn4Y9gOvr+7PGQvz0oBLGxJRnhdYQ3wLVfE KabtQfK80a+2vBgIe4a4ZZCwwK9tA4duEimeQY3OwNOVQ/jd4kpWRdGxo3UBoP3Vu9vZ YYaulL1rUHe6787zpldKzWOKrU+13T8EIsFegQ6caoZdkxvV/l6qLv9NpbF7/DloTnWD iGfw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 3-v6si9979388plt.98.2018.03.30.21.16.34; Fri, 30 Mar 2018 21:16:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753090AbeCaENl (ORCPT + 99 others); Sat, 31 Mar 2018 00:13:41 -0400 Received: from mga03.intel.com ([134.134.136.65]:25317 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753059AbeCaENh (ORCPT ); Sat, 31 Mar 2018 00:13:37 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 30 Mar 2018 21:13:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.48,383,1517904000"; d="scan'208";a="30382906" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga006.jf.intel.com with ESMTP; 30 Mar 2018 21:13:36 -0700 Subject: [PATCH v8 17/18] xfs: prepare xfs_break_layouts() for another layout type From: Dan Williams To: linux-nvdimm@lists.01.org Cc: Ross Zwisler , "Darrick J. Wong" , Dave Chinner , Christoph Hellwig , Christoph Hellwig , linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org, jack@suse.cz, snitzer@redhat.com Date: Fri, 30 Mar 2018 21:03:41 -0700 Message-ID: <152246902093.36038.6900888641940957516.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <152246892890.36038.18436540150980653229.stgit@dwillia2-desk3.amr.corp.intel.com> References: <152246892890.36038.18436540150980653229.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When xfs is operating as the back-end of a pNFS block server, it prevents collisions between local and remote operations by requiring a lease to be held for remotely accessed blocks. Local filesystem operations break those leases before writing or mutating the extent map of the file. A similar mechanism is needed to prevent operations on pinned dax mappings, like device-DMA, from colliding with extent unmap operations. BREAK_WRITE and BREAK_UNMAP are introduced as two distinct levels of layout breaking. Layouts are broken in the BREAK_WRITE case to ensure that layout-holders do not collide with local writes. Additionally, layouts are broken in the BREAK_UNMAP case to make sure the layout-holder has a consistent view of the file's extent map. While BREAK_WRITE breaks can be satisfied be recalling FL_LAYOUT leases, BREAK_UNMAP breaks additionally require waiting for busy dax-pages to go idle while holding XFS_MMAPLOCK_EXCL. After this refactoring xfs_break_layouts() becomes the entry point for coordinating both types of breaks. Finally, xfs_break_leased_layouts() becomes just the BREAK_WRITE handler. Note that the unlock tracking is needed in a follow on change. That will coordinate retrying either break handler until both successfully test for a lease break while maintaining the lock state. Cc: Ross Zwisler Cc: "Darrick J. Wong" Reported-by: Dave Chinner Reported-by: Christoph Hellwig Reviewed-by: Christoph Hellwig Signed-off-by: Dan Williams --- fs/xfs/xfs_file.c | 30 ++++++++++++++++++++++++++++-- fs/xfs/xfs_inode.h | 16 ++++++++++++++++ fs/xfs/xfs_ioctl.c | 3 +-- fs/xfs/xfs_iops.c | 6 +++--- fs/xfs/xfs_pnfs.c | 13 +++++++------ fs/xfs/xfs_pnfs.h | 6 ++++-- 6 files changed, 59 insertions(+), 15 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 18edf04811d0..51e6506bdcb1 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -350,7 +350,7 @@ xfs_file_aio_write_checks( if (error <= 0) return error; - error = xfs_break_layouts(inode, iolock); + error = xfs_break_layouts(inode, iolock, BREAK_WRITE); if (error) return error; @@ -752,6 +752,32 @@ xfs_file_write_iter( return ret; } +int +xfs_break_layouts( + struct inode *inode, + uint *iolock, + enum layout_break_reason reason) +{ + bool retry = false; + int error = 0; + + ASSERT(xfs_isilocked(XFS_I(inode), XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL)); + + switch (reason) { + case BREAK_UNMAP: + ASSERT(xfs_isilocked(XFS_I(inode), XFS_MMAPLOCK_EXCL)); + /* fall through */ + case BREAK_WRITE: + error = xfs_break_leased_layouts(inode, iolock, &retry); + break; + default: + WARN_ON_ONCE(1); + return -EINVAL; + } + + return error; +} + #define XFS_FALLOC_FL_SUPPORTED \ (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \ FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE | \ @@ -778,7 +804,7 @@ xfs_file_fallocate( return -EOPNOTSUPP; xfs_ilock(ip, iolock); - error = xfs_break_layouts(inode, &iolock); + error = xfs_break_layouts(inode, &iolock, BREAK_UNMAP); if (error) goto out_unlock; diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 3e8dc990d41c..7e1a077dfc04 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -379,6 +379,20 @@ static inline void xfs_ifunlock(struct xfs_inode *ip) >> XFS_ILOCK_SHIFT) /* + * Layouts are broken in the BREAK_WRITE case to ensure that + * layout-holders do not collide with local writes. Additionally, + * layouts are broken in the BREAK_UNMAP case to make sure the + * layout-holder has a consistent view of the file's extent map. While + * BREAK_WRITE breaks can be satisfied be recalling FL_LAYOUT leases, + * BREAK_UNMAP breaks additionally require waiting for busy dax-pages to + * go idle. + */ +enum layout_break_reason { + BREAK_WRITE, + BREAK_UNMAP, +}; + +/* * For multiple groups support: if S_ISGID bit is set in the parent * directory, group of new file is set to that of the parent, and * new subdirectory gets S_ISGID bit from parent. @@ -447,6 +461,8 @@ int xfs_zero_eof(struct xfs_inode *ip, xfs_off_t offset, xfs_fsize_t isize, bool *did_zeroing); int xfs_zero_range(struct xfs_inode *ip, xfs_off_t pos, xfs_off_t count, bool *did_zero); +int xfs_break_layouts(struct inode *inode, uint *iolock, + enum layout_break_reason reason); /* from xfs_iops.c */ extern void xfs_setup_inode(struct xfs_inode *ip); diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 4151fade4bb1..91e73d663099 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -39,7 +39,6 @@ #include "xfs_icache.h" #include "xfs_symlink.h" #include "xfs_trans.h" -#include "xfs_pnfs.h" #include "xfs_acl.h" #include "xfs_btree.h" #include @@ -644,7 +643,7 @@ xfs_ioc_space( return error; xfs_ilock(ip, iolock); - error = xfs_break_layouts(inode, &iolock); + error = xfs_break_layouts(inode, &iolock, BREAK_UNMAP); if (error) goto out_unlock; diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index d23aa08426f9..04abb077e91a 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -37,7 +37,6 @@ #include "xfs_da_btree.h" #include "xfs_dir2.h" #include "xfs_trans_space.h" -#include "xfs_pnfs.h" #include "xfs_iomap.h" #include @@ -1027,13 +1026,14 @@ xfs_vn_setattr( int error; if (iattr->ia_valid & ATTR_SIZE) { - struct xfs_inode *ip = XFS_I(d_inode(dentry)); + struct inode *inode = d_inode(dentry); + struct xfs_inode *ip = XFS_I(inode); uint iolock; xfs_ilock(ip, XFS_MMAPLOCK_EXCL); iolock = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL; - error = xfs_break_layouts(d_inode(dentry), &iolock); + error = xfs_break_layouts(inode, &iolock, BREAK_UNMAP); if (error) { xfs_iunlock(ip, XFS_MMAPLOCK_EXCL); return error; diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c index 6ea7b0b55d02..40e69edb7e2e 100644 --- a/fs/xfs/xfs_pnfs.c +++ b/fs/xfs/xfs_pnfs.c @@ -31,17 +31,18 @@ * rules in the page fault path we don't bother. */ int -xfs_break_layouts( +xfs_break_leased_layouts( struct inode *inode, - uint *iolock) + uint *iolock, + bool *did_unlock) { struct xfs_inode *ip = XFS_I(inode); int error; - ASSERT(xfs_isilocked(ip, XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL)); - + *did_unlock = false; while ((error = break_layout(inode, false) == -EWOULDBLOCK)) { xfs_iunlock(ip, *iolock); + *did_unlock = true; error = break_layout(inode, true); *iolock &= ~XFS_IOLOCK_SHARED; *iolock |= XFS_IOLOCK_EXCL; @@ -121,8 +122,8 @@ xfs_fs_map_blocks( * Lock out any other I/O before we flush and invalidate the pagecache, * and then hand out a layout to the remote system. This is very * similar to direct I/O, except that the synchronization is much more - * complicated. See the comment near xfs_break_layouts for a detailed - * explanation. + * complicated. See the comment near xfs_break_leased_layouts + * for a detailed explanation. */ xfs_ilock(ip, XFS_IOLOCK_EXCL); diff --git a/fs/xfs/xfs_pnfs.h b/fs/xfs/xfs_pnfs.h index bf45951e28fe..0f2f51037064 100644 --- a/fs/xfs/xfs_pnfs.h +++ b/fs/xfs/xfs_pnfs.h @@ -9,11 +9,13 @@ int xfs_fs_map_blocks(struct inode *inode, loff_t offset, u64 length, int xfs_fs_commit_blocks(struct inode *inode, struct iomap *maps, int nr_maps, struct iattr *iattr); -int xfs_break_layouts(struct inode *inode, uint *iolock); +int xfs_break_leased_layouts(struct inode *inode, uint *iolock, + bool *did_unlock); #else static inline int -xfs_break_layouts(struct inode *inode, uint *iolock) +xfs_break_leased_layouts(struct inode *inode, uint *iolock, bool *did_unlock) { + *did_unlock = false; return 0; } #endif /* CONFIG_EXPORTFS_BLOCK_OPS */