Received: by 10.213.65.68 with SMTP id h4csp300694imn; Fri, 30 Mar 2018 21:15:22 -0700 (PDT) X-Google-Smtp-Source: AIpwx48QAMveCaxYoaAIS6gFslifSwhzrwTRnCil22qLGD77cfDWT9xVlpOmJuHdva3ctwGbs6Bx X-Received: by 10.98.156.7 with SMTP id f7mr1218091pfe.104.1522469722216; Fri, 30 Mar 2018 21:15:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522469722; cv=none; d=google.com; s=arc-20160816; b=RoB1V6eZ0ujL18KwPei9ZH2IFQ7wZ9Fh3Ww0/IVyNMX1pU3Vs4iZoNfAsKs0HIpWun rcYyw+cFQ42EjYxxqbsf2ge7wsqugaaiLmOGIlcBgQe20rPoJRPrJE/91CI0MH3FrIE+ YtnfQzQOCHWxPyF1tLVF8F+ARPuv0VtOjxopwsE5WQaP24Mz95ZMMgQ20nr1g3AgkhvZ UiBQHiO/mYDOf+TJSKBceqShdu5CmbO681WVRgUBRw/9DYBWsE681FfdJ12xIwQRwvxE OSsT6voOBo3gJkYqkngVhHkHP9E8hQde3svwel1fSoYzrkXQzesOWKs6nni6b6N8nbUb NB/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:arc-authentication-results; bh=nswG1Q6NTM9hq3O3b//LDS4atb6GNWjHta9XJ4uB2vA=; b=IYmFjtom/uprQAek10BDIbwnlZRBTnARSRII7uCghnfz2v8VnUIFiioa1QQdBo6vl4 J3sM8clYSUIu/IuYEtrS7ty9b42GjNGqrK6YMGLSY/VG1Dey87GB1Id6F4mdtN817ohg mdctgfLqefwG2A3eCLPJgql5WqxuJ1kljTFXlcGpeBlnP0Q+Yd4Mx3GfrS2hj5KIUoop oW1d4gO7HnvHkyuImgzZfeSDmYaEF6m/QWf8Ron5+KFWW5msmdkZlz6dVrs7/0nn6VTP +lFRpYkC+vthLqnwCVIcbWQi3XscghxbefemH2E6h65SeHMUPGIJDvY6/2o6E/MTukbB 1ANg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h9si6549752pgf.362.2018.03.30.21.15.08; Fri, 30 Mar 2018 21:15:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753129AbeCaENn (ORCPT + 99 others); Sat, 31 Mar 2018 00:13:43 -0400 Received: from mga03.intel.com ([134.134.136.65]:25317 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753103AbeCaENl (ORCPT ); Sat, 31 Mar 2018 00:13:41 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 30 Mar 2018 21:13:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.48,383,1517904000"; d="scan'208";a="29893073" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga008.jf.intel.com with ESMTP; 30 Mar 2018 21:13:41 -0700 Subject: [PATCH v8 18/18] xfs, dax: introduce xfs_break_dax_layouts() From: Dan Williams To: linux-nvdimm@lists.01.org Cc: Jan Kara , Dave Chinner , "Darrick J. Wong" , Ross Zwisler , Christoph Hellwig , linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org, jack@suse.cz, snitzer@redhat.com Date: Fri, 30 Mar 2018 21:03:46 -0700 Message-ID: <152246902607.36038.15813002361509305325.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <152246892890.36038.18436540150980653229.stgit@dwillia2-desk3.amr.corp.intel.com> References: <152246892890.36038.18436540150980653229.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org xfs_break_dax_layouts(), similar to xfs_break_leased_layouts(), scans for busy / pinned dax pages and waits for those pages to go idle before any potential extent unmap operation. dax_layout_busy_page() handles synchronizing against new page-busy events (get_user_pages). It invalidates all mappings to trigger the get_user_pages slow path which will eventually block on the xfs inode lock held in XFS_MMAPLOCK_EXCL mode. If dax_layout_busy_page() finds a busy page it returns it for xfs to wait for the page-idle event that will fire when the page reference count reaches 1 (recall ZONE_DEVICE pages are idle at count 1, see generic_dax_pagefree()). While waiting, the XFS_MMAPLOCK_EXCL lock is dropped in order to not deadlock the process that might be trying to elevate the page count of more pages before arranging for any of them to go idle. I.e. the typical case of submitting I/O is that iov_iter_get_pages() elevates the reference count of all pages in the I/O before starting I/O on the first page. The process of elevating the reference count of all pages involved in an I/O may cause faults that need to take XFS_MMAPLOCK_EXCL. Cc: Jan Kara Cc: Dave Chinner Cc: "Darrick J. Wong" Cc: Ross Zwisler Reviewed-by: Christoph Hellwig Signed-off-by: Dan Williams --- fs/xfs/xfs_file.c | 60 +++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 49 insertions(+), 11 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 51e6506bdcb1..0342f6fb782f 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -752,6 +752,38 @@ xfs_file_write_iter( return ret; } +static void +xfs_wait_var_event( + struct inode *inode, + uint iolock, + bool *did_unlock) +{ + struct xfs_inode *ip = XFS_I(inode); + + *did_unlock = true; + xfs_iunlock(ip, iolock); + schedule(); + xfs_ilock(ip, iolock); +} + +static int +xfs_break_dax_layouts( + struct inode *inode, + uint iolock, + bool *did_unlock) +{ + struct page *page; + + *did_unlock = false; + page = dax_layout_busy_page(inode->i_mapping); + if (!page) + return 0; + + return ___wait_var_event(&page->_refcount, + atomic_read(&page->_refcount) == 1, TASK_INTERRUPTIBLE, + 0, 0, xfs_wait_var_event(inode, iolock, did_unlock)); +} + int xfs_break_layouts( struct inode *inode, @@ -763,17 +795,23 @@ xfs_break_layouts( ASSERT(xfs_isilocked(XFS_I(inode), XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL)); - switch (reason) { - case BREAK_UNMAP: - ASSERT(xfs_isilocked(XFS_I(inode), XFS_MMAPLOCK_EXCL)); - /* fall through */ - case BREAK_WRITE: - error = xfs_break_leased_layouts(inode, iolock, &retry); - break; - default: - WARN_ON_ONCE(1); - return -EINVAL; - } + do { + switch (reason) { + case BREAK_UNMAP: + ASSERT(xfs_isilocked(XFS_I(inode), XFS_MMAPLOCK_EXCL)); + + error = xfs_break_dax_layouts(inode, *iolock, &retry); + /* fall through */ + case BREAK_WRITE: + if (error || retry) + break; + error = xfs_break_leased_layouts(inode, iolock, &retry); + break; + default: + WARN_ON_ONCE(1); + return -EINVAL; + } + } while (error == 0 && retry); return error; }