Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp3745038pxk; Tue, 29 Sep 2020 05:20:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzAphDwCTUowzzkfusEP9d07JPlMll7CpUnaVKtcX3weL6lkmsFWmhvq5yO9gSrSFp2IYCp X-Received: by 2002:a17:906:3957:: with SMTP id g23mr3808995eje.24.1601382056726; Tue, 29 Sep 2020 05:20:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1601382056; cv=none; d=google.com; s=arc-20160816; b=jd1cqpB3Bv9tkC/VG55OPtzsmDQs7K4p7rbH1x6lNVrF5vF0rDzKnhGcnfMXHpgbAe rLejcs+v4bwGzi00Fsfl2McxVYDqyy/QiGAsteHFR5o++FtAic1IVwOQUiWnCGudrMEZ e2uF4qNmFF3Srk0UidkKz2G8rtZyajedXK4WwTzytRKKXLrVSDAoTNGNAr+yc7zzGSN7 0aCBCUYOocHjl8MF/UDT6rA3hvFr/XPI17OqgoXIbHc/PgICGqn44vmf3DaO0/c8AhI3 EwLX8F4u1S/SwLEBr3gKJn5At7O4o2JuMTfp7cs58VrqBS0PZZ+WyVJ47bXM2ppwBTYb kkog== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=EiCSvG1ir0hbRKhmI8krfSMxci7kraQpS9x0NTCoAL0=; b=uRXnVAEU6G4B/AulghttXv2TuatPTydO36ccY569ySLYCtuQob9C4E4L3EvAHzXO5L u3dXlVP0MSGVwWNu0DpqihlnWF/F4g/LhRPaEG7CN3zUXZEZlW89GvPRj/MGLCrpyuU+ J//27lXJPYPKNk5D8m551j2ysMEQrAxaF51HNu78k00qnofVXlx3llJ2FsnLTHoEiWtm k0yifxeJTU9l9EfDn5Mctw5a2XwDaVptZ37YD37SPRX0OMlHkgx3H4DUXwkOTAUTJomm O4qnryYHfAO/kVWpqb0FmiydroHO09SR7SDOpeyxZa6CLb+YOupVZgShVBXPLSsCK/Sr RZqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=fcjdBAE6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dt2si2374849ejc.630.2020.09.29.05.20.32; Tue, 29 Sep 2020 05:20:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=fcjdBAE6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732216AbgI2MRc (ORCPT + 99 others); Tue, 29 Sep 2020 08:17:32 -0400 Received: from mail.kernel.org ([198.145.29.99]:49162 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730033AbgI2Lg7 (ORCPT ); Tue, 29 Sep 2020 07:36:59 -0400 Received: from localhost (83-86-74-64.cable.dynamic.v4.ziggo.nl [83.86.74.64]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 8AE3323EB1; Tue, 29 Sep 2020 11:32:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601379137; bh=3QIRnvGoGR32Y22SZT4wMKErdcT9TQCgK//mFViaPYM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fcjdBAE6Fk++P8Mfc8uU1jheg9FXMBRAUpL9Zl1iWom6xmFhG6bcfB3HqdqzZrVE+ Pkk5acvmxmU8QbWUnoymbmcXcFTp2ZCtOzg9eBpfG8f086vPuwqGPMUHlk0ZidiNTx C1ePtDTEp3L3x0xpO23su4+FaB6QFDL5TGHoNEI4= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Dave Chinner , Christoph Hellwig , Brian Foster , "Darrick J. Wong" , Sasha Levin Subject: [PATCH 5.4 022/388] xfs: properly serialise fallocate against AIO+DIO Date: Tue, 29 Sep 2020 12:55:53 +0200 Message-Id: <20200929110011.557776965@linuxfoundation.org> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200929110010.467764689@linuxfoundation.org> References: <20200929110010.467764689@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Dave Chinner [ Upstream commit 249bd9087a5264d2b8a974081870e2e27671b4dc ] AIO+DIO can extend the file size on IO completion, and it holds no inode locks while the IO is in flight. Therefore, a race condition exists in file size updates if we do something like this: aio-thread fallocate-thread lock inode submit IO beyond inode->i_size unlock inode ..... lock inode break layouts if (off + len > inode->i_size) new_size = off + len ..... inode_dio_wait() ..... completes inode->i_size updated inode_dio_done() .... if (new_size) xfs_vn_setattr(inode, new_size) Yup, that attempt to extend the file size in the fallocate code turns into a truncate - it removes the whatever the aio write allocated and put to disk, and reduced the inode size back down to where the fallocate operation ends. Fundamentally, xfs_file_fallocate() not compatible with racing AIO+DIO completions, so we need to move the inode_dio_wait() call up to where the lock the inode and break the layouts. Secondly, storing the inode size and then using it unchecked without holding the ILOCK is not safe; we can only do such a thing if we've locked out and drained all IO and other modification operations, which we don't do initially in xfs_file_fallocate. It should be noted that some of the fallocate operations are compound operations - they are made up of multiple manipulations that may zero data, and so we may need to flush and invalidate the file multiple times during an operation. However, we only need to lock out IO and other space manipulation operations once, as that lockout is maintained until the entire fallocate operation has been completed. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Brian Foster Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Sasha Levin --- fs/xfs/xfs_bmap_util.c | 8 +------- fs/xfs/xfs_file.c | 30 ++++++++++++++++++++++++++++++ fs/xfs/xfs_ioctl.c | 1 + 3 files changed, 32 insertions(+), 7 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 0c71acc1b8317..d6d78e1276254 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1039,6 +1039,7 @@ out_trans_cancel: goto out_unlock; } +/* Caller must first wait for the completion of any pending DIOs if required. */ int xfs_flush_unmap_range( struct xfs_inode *ip, @@ -1050,9 +1051,6 @@ xfs_flush_unmap_range( xfs_off_t rounding, start, end; int error; - /* wait for the completion of any pending DIOs */ - inode_dio_wait(inode); - rounding = max_t(xfs_off_t, 1 << mp->m_sb.sb_blocklog, PAGE_SIZE); start = round_down(offset, rounding); end = round_up(offset + len, rounding) - 1; @@ -1084,10 +1082,6 @@ xfs_free_file_space( if (len <= 0) /* if nothing being freed */ return 0; - error = xfs_flush_unmap_range(ip, offset, len); - if (error) - return error; - startoffset_fsb = XFS_B_TO_FSB(mp, offset); endoffset_fsb = XFS_B_TO_FSBT(mp, offset + len); diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 1e2176190c86f..203065a647652 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -818,6 +818,36 @@ xfs_file_fallocate( if (error) goto out_unlock; + /* + * Must wait for all AIO to complete before we continue as AIO can + * change the file size on completion without holding any locks we + * currently hold. We must do this first because AIO can update both + * the on disk and in memory inode sizes, and the operations that follow + * require the in-memory size to be fully up-to-date. + */ + inode_dio_wait(inode); + + /* + * Now AIO and DIO has drained we flush and (if necessary) invalidate + * the cached range over the first operation we are about to run. + * + * We care about zero and collapse here because they both run a hole + * punch over the range first. Because that can zero data, and the range + * of invalidation for the shift operations is much larger, we still do + * the required flush for collapse in xfs_prepare_shift(). + * + * Insert has the same range requirements as collapse, and we extend the + * file first which can zero data. Hence insert has the same + * flush/invalidate requirements as collapse and so they are both + * handled at the right time by xfs_prepare_shift(). + */ + if (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE | + FALLOC_FL_COLLAPSE_RANGE)) { + error = xfs_flush_unmap_range(ip, offset, len); + if (error) + goto out_unlock; + } + if (mode & FALLOC_FL_PUNCH_HOLE) { error = xfs_free_file_space(ip, offset, len); if (error) diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index c93c4b7328ef7..60c4526312771 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -622,6 +622,7 @@ xfs_ioc_space( error = xfs_break_layouts(inode, &iolock, BREAK_UNMAP); if (error) goto out_unlock; + inode_dio_wait(inode); switch (bf->l_whence) { case 0: /*SEEK_SET*/ -- 2.25.1