Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp10428206ybi; Wed, 24 Jul 2019 23:02:57 -0700 (PDT) X-Google-Smtp-Source: APXvYqxymb0fqw+TvNX0uhTsu1I5je3vByVmd5IDSH6IjL8RfQ5V9BTXJ97BuTIKpR/y8g8jSL84 X-Received: by 2002:a65:4189:: with SMTP id a9mr58079359pgq.399.1564034577040; Wed, 24 Jul 2019 23:02:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564034577; cv=none; d=google.com; s=arc-20160816; b=km7BF7IPXzCwhhwa1RgNadf3BULboQ57+quaxiB6YoMyqZkmR4hCJ10Q186FhxLhx5 x3Jv4sbDjJclbuwn+vV98SYmlijCbQB+LXPBdAfEE7Y8RJybREPx7XN9FQ7LytFdm+C7 oim6cupn11oz5rTgDvxr8ZhOgMbUEOTpcaGYfDWLc+t/tHyZ2L4rY11gReztZt1ja4BT g5g88uxx8ftAGDU5h1vUNpUrrmMoNfRY0zGcanlIeui1PeE1bougBdJgOGkmA3jd08IB 7JTo7wqBuucIfSsaTJVCsCRzwKHxt7uViVtBL13vDVzDH9qwK9Dwr0gi7E8aSFPiW3XM cnJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=vBae86mBp31/NyxqiBcUf/vbODnDFyWzNkk/pZnPI+g=; b=BjZwbTLjK1xVgykPu5hE1mcx5Fq+yKwPs1dn0jgdxfNkVDLWYgdEdM2YoGW1uFJTe7 YgN/VY6pbA1ssvFHIqa8v+mDQHKml2hfNOoPaFf2X/NlZGKYgeTemJR/7rPlb+ef6GNh VMdX6pX0U1xhy5JyhZSAEgXCvkMMD6hEmpkRrzFiu8AHnCMC/G7PLXDK7sJIaNn6Ru5T gVJtdpNMJ27JyENW2NEZkiOYEtXRyF1vUYATfb0mGMQ1fMu3+wsBDuqA56hOXuMqrbO+ /K4k8n5qFjUb98v2IepMyiZ4xPtRPEbGjCZBN9aSdfhoeC6iPTUEmkV5O9deQ8EDZzn3 K0Tg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Fzm8kFz7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f20si19230650pfn.166.2019.07.24.23.02.37; Wed, 24 Jul 2019 23:02:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Fzm8kFz7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405111AbfGYFpI (ORCPT + 99 others); Thu, 25 Jul 2019 01:45:08 -0400 Received: from mail.kernel.org ([198.145.29.99]:60528 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405094AbfGYFpH (ORCPT ); Thu, 25 Jul 2019 01:45:07 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0E5AA22BEB; Thu, 25 Jul 2019 05:45:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1564033506; bh=qqUFRMmHPCLqthnTJ0vlPoMPjMW60zHjRn37ub9NQWU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Fzm8kFz73IotMblFJwRgD3w4KgkOW4eM/t9Ir+1mxwys2fpdwwDRPWO+boCb/hIKr wjSppI3hfRw42rqOzqRJtL0Bf7gW+9i2CDFVm3hMyYBfZWZOeDMTVNoQCHr6Gd1Rks Y/KalF90B+LRv+NBnIC/AZVPqZgCsNByHiUEpUCw= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Filipe Manana , David Sterba Subject: [PATCH 4.19 232/271] Btrfs: fix data loss after inode eviction, renaming it, and fsync it Date: Wed, 24 Jul 2019 21:21:41 +0200 Message-Id: <20190724191715.039653081@linuxfoundation.org> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190724191655.268628197@linuxfoundation.org> References: <20190724191655.268628197@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Filipe Manana commit d1d832a0b51dd9570429bb4b81b2a6c1759e681a upstream. When we log an inode, regardless of logging it completely or only that it exists, we always update it as logged (logged_trans and last_log_commit fields of the inode are updated). This is generally fine and avoids future attempts to log it from having to do repeated work that brings no value. However, if we write data to a file, then evict its inode after all the dealloc was flushed (and ordered extents completed), rename the file and fsync it, we end up not logging the new extents, since the rename may result in logging that the inode exists in case the parent directory was logged before. The following reproducer shows and explains how this can happen: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir /mnt/dir $ touch /mnt/dir/foo $ touch /mnt/dir/bar # Do a direct IO write instead of a buffered write because with a # buffered write we would need to make sure dealloc gets flushed and # complete before we do the inode eviction later, and we can not do that # from user space with call to things such as sync(2) since that results # in a transaction commit as well. $ xfs_io -d -c "pwrite -S 0xd3 0 4K" /mnt/dir/bar # Keep the directory dir in use while we evict inodes. We want our file # bar's inode to be evicted but we don't want our directory's inode to # be evicted (if it were evicted too, we would not be able to reproduce # the issue since the first fsync below, of file foo, would result in a # transaction commit. $ ( cd /mnt/dir; while true; do :; done ) & $ pid=$! # Wait a bit to give time for the background process to chdir. $ sleep 0.1 # Evict all inodes, except the inode for the directory dir because it is # currently in use by our background process. $ echo 2 > /proc/sys/vm/drop_caches # fsync file foo, which ends up persisting information about the parent # directory because it is a new inode. $ xfs_io -c fsync /mnt/dir/foo # Rename bar, this results in logging that this inode exists (inode item, # names, xattrs) because the parent directory is in the log. $ mv /mnt/dir/bar /mnt/dir/baz # Now fsync baz, which ends up doing absolutely nothing because of the # rename operation which logged that the inode exists only. $ xfs_io -c fsync /mnt/dir/baz $ mount /dev/sdb /mnt $ od -t x1 -A d /mnt/dir/baz 0000000 --> Empty file, data we wrote is missing. Fix this by not updating last_sub_trans of an inode when we are logging only that it exists and the inode was not yet logged since it was loaded from disk (full_sync bit set), this is enough to make btrfs_inode_in_log() return false for this scenario and make us log the inode. The logged_trans of the inode is still always setsince that alone is used to track if names need to be deleted as part of unlink operations. Fixes: 257c62e1bce03e ("Btrfs: avoid tree log commit when there are no changes") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/tree-log.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -5250,9 +5250,19 @@ log_extents: } } + /* + * Don't update last_log_commit if we logged that an inode exists after + * it was loaded to memory (full_sync bit set). + * This is to prevent data loss when we do a write to the inode, then + * the inode gets evicted after all delalloc was flushed, then we log + * it exists (due to a rename for example) and then fsync it. This last + * fsync would do nothing (not logging the extents previously written). + */ spin_lock(&inode->lock); inode->logged_trans = trans->transid; - inode->last_log_commit = inode->last_sub_trans; + if (inode_only != LOG_INODE_EXISTS || + !test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags)) + inode->last_log_commit = inode->last_sub_trans; spin_unlock(&inode->lock); out_unlock: mutex_unlock(&inode->log_mutex);