Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp8101044imu; Tue, 4 Dec 2018 03:02:03 -0800 (PST) X-Google-Smtp-Source: AFSGD/X2vGrs03JD+6ZPH2oS8GhABb8sWvw2ljHToX8JY0QJ6K1Yjrcr79e9+w5usL8bE83oQK3/ X-Received: by 2002:a65:6447:: with SMTP id s7mr16279003pgv.226.1543921322933; Tue, 04 Dec 2018 03:02:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543921322; cv=none; d=google.com; s=arc-20160816; b=eEvVNMjKVGCw8jZIMgQ6pMAGTUEeUGBk1TuVYF0wdHBhlQAzXhFqlZwHcSM19lglZ9 WvJnZkM/MrXfQuVUBcSrX3d7SI9gbCJLWKZZgIaOgWTQFI8ZbAvIPcNMSrjZxmkQiqpn Zo2LhkxxrQ3XkHR0jqOsM+wJ7TnXUzVwuyABPsUAT20j89RIDqkYySRDTC22nfI8lgl7 EWwS/NXEcQHxCyyhiCEVIjkdNaWGMI+lpk9oAIfV+xbBQee71OgKG+N0Nu1n5W5OhCsW Puby3WYdFXjuCi0/3dhYzcNmTvvim/nF/UmIcjuJHXlY/zvUi6yMWacpvxvzbfg7SDWU XY7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=dV/sPNK5ppJilfZSOf9b6DJxYNaI5ZKdIOhCJ1aS/no=; b=F/4TWqkHSy3ZeBZ/zZPS3R22foNNFWeZhx/ms+4SZi2M6gxWwVTQckZw9d5XKYHJQU znPin/RSiTxSHmlUJMaAlqkJFsze0XWj7SC2OkUuSU+WvQBEd+n5lw4U0GoRKajCcEj4 XhFURkf4R023dfBKHQC4lcpt1/wxOV+dI95FODvqlK+DUVfEaQc+mJxw6Br+NOsJbETw QImc7jggBuUsvtvgSi/3PRMpgineosFneVHVJ5NuR6tKtAetuq9WjMsrLwL3QfB8jzn4 M58ZQnZIg8pGCOEjttf+FY/cKhgwhUdShKS72TeXBRX7g4F07WUqnrR4hzrD2nKD67vF eMvQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=PoxIbs58; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b8si16125757plx.383.2018.12.04.03.01.47; Tue, 04 Dec 2018 03:02:02 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=PoxIbs58; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727125AbeLDK7W (ORCPT + 99 others); Tue, 4 Dec 2018 05:59:22 -0500 Received: from mail.kernel.org ([198.145.29.99]:44034 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727116AbeLDK7T (ORCPT ); Tue, 4 Dec 2018 05:59:19 -0500 Received: from localhost (5356596B.cm-6-7b.dynamic.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 12311214DB; Tue, 4 Dec 2018 10:59:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1543921158; bh=d87MjXNQvtsyMJoDZJS69utd48fHo86s72Mxdi2q5bA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PoxIbs584KsRVfz/xEZWv+5ikHZzWxSjtW3CL8p5+bKWJYWhAzfWLaM3yk5gUVUot C+IhQBLDApVcpUiGXF7HkAagm9KqTdSVH9pFYfTwg9ahXvQZlo5plg5/SMk3GE7HL6 Y6/fki5navaEhjlKN7FOxT+Y/ezeuhT3cNr4BcpY= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Josef Bacik , Filipe Manana , David Sterba Subject: [PATCH 4.19 076/139] Btrfs: fix rare chances for data loss when doing a fast fsync Date: Tue, 4 Dec 2018 11:49:17 +0100 Message-Id: <20181204103653.251938373@linuxfoundation.org> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20181204103649.950154335@linuxfoundation.org> References: <20181204103649.950154335@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review X-Patchwork-Hint: ignore MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.19-stable review patch. If anyone has any objections, please let me know. ------------------ From: Filipe Manana commit aab15e8ec25765cf7968c72cbec7583acf99d8a4 upstream. After the simplification of the fast fsync patch done recently by commit b5e6c3e170b7 ("btrfs: always wait on ordered extents at fsync time") and commit e7175a692765 ("btrfs: remove the wait ordered logic in the log_one_extent path"), we got a very short time window where we can get extents logged without writeback completing first or extents logged without logging the respective data checksums. Both issues can only happen when doing a non-full (fast) fsync. As soon as we enter btrfs_sync_file() we trigger writeback, then lock the inode and then wait for the writeback to complete before starting to log the inode. However before we acquire the inode's lock and after we started writeback, it's possible that more writes happened and dirtied more pages. If that happened and those pages get writeback triggered while we are logging the inode (for example, the VM subsystem triggering it due to memory pressure, or another concurrent fsync), we end up seeing the respective extent maps in the inode's list of modified extents and will log matching file extent items without waiting for the respective ordered extents to complete, meaning that either of the following will happen: 1) We log an extent after its writeback finishes but before its checksums are added to the csum tree, leading to -EIO errors when attempting to read the extent after a log replay. 2) We log an extent before its writeback finishes. Therefore after the log replay we will have a file extent item pointing to an unwritten extent (and without the respective data checksums as well). This could not happen before the fast fsync patch simplification, because for any extent we found in the list of modified extents, we would wait for its respective ordered extent to finish writeback or collect its checksums for logging if it did not complete yet. Fix this by triggering writeback again after acquiring the inode's lock and before waiting for ordered extents to complete. Fixes: e7175a692765 ("btrfs: remove the wait ordered logic in the log_one_extent path") Fixes: b5e6c3e170b7 ("btrfs: always wait on ordered extents at fsync time") CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Josef Bacik Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/file.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2089,6 +2089,30 @@ int btrfs_sync_file(struct file *file, l atomic_inc(&root->log_batch); /* + * Before we acquired the inode's lock, someone may have dirtied more + * pages in the target range. We need to make sure that writeback for + * any such pages does not start while we are logging the inode, because + * if it does, any of the following might happen when we are not doing a + * full inode sync: + * + * 1) We log an extent after its writeback finishes but before its + * checksums are added to the csum tree, leading to -EIO errors + * when attempting to read the extent after a log replay. + * + * 2) We can end up logging an extent before its writeback finishes. + * Therefore after the log replay we will have a file extent item + * pointing to an unwritten extent (and no data checksums as well). + * + * So trigger writeback for any eventual new dirty pages and then we + * wait for all ordered extents to complete below. + */ + ret = start_ordered_ops(inode, start, end); + if (ret) { + inode_unlock(inode); + goto out; + } + + /* * We have to do this here to avoid the priority inversion of waiting on * IO of a lower priority task while holding a transaciton open. */