Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp9939734ybi; Wed, 24 Jul 2019 12:42:12 -0700 (PDT) X-Google-Smtp-Source: APXvYqxJCUXwKM3tNAIsbF3cPGTvXvP3BYUhLkd/q7VmK4iBwf2BxTZIVaLMaKVJPAyhI9kog0db X-Received: by 2002:a63:608c:: with SMTP id u134mr83103269pgb.274.1563997332090; Wed, 24 Jul 2019 12:42:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563997332; cv=none; d=google.com; s=arc-20160816; b=ruWypEZAWFC3w6P42GPpve461LtcrX+TtSYisWbkUK/ZLlFjQpEILRI2Xhus/+jGYc zowaO2MJP6UVZhUYqRK6N66WXGPX8zTxCuotufGDiGW+TuwfPFZOhctfEEPFJLmURmXA 3aQu7iWxy2fhK5pQvKeD3O8VoO7g1IaNHYuN9nlQREHZS/5msY9sHkxBKCDgUiVNiQ5A 6o9HS8gbMxhX6cSpBBA9RlagRkhhgkCNr26EflsqsyXYPdbKrnl8Cn+adJm+gV6Pk5jM 3VO0dI0qPt89Uc9o0G355sRRWq/2CqpLFpdGZjVnTOml7ycLd0Wv6zM4LcTRvGkijDwe AaOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=+U9nD1+tE4CMaQb1IKwhQdyWieqnnEMJ4rzwWBhEd2s=; b=lNqlgP1N2f6uWedqTW4KzjPcjONlR+52bqxGfNoCfj5VLrpFeHFa6W98tmRH1qdFRj BZltdCeY/vpnOnkh3HUFcJptgKZ1TM7GRYNKZp6oBDkFV8X2Vxujqhr3VDdgeRpy6xNY gH5N38GE7k6/VJVFWN1foPRZwl6+VERxXYkHuI306SmUp2xS6jtspUuw3fPbueNNtL7Z 0qTBHSjjWyyPWYExkKrdxYupV0Ke/tG45e7WP5F5Va30qtBFQER3Hs8Zyy4uZDqPieG8 0D2aHVdlOPW+/1xXo1egxmeucWLsJVpPp9ebtJdUqTsPMaPO0hb8y94qstU7oeIpU2IT 00tg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=cVIht7oJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 78si14593372pfz.268.2019.07.24.12.41.57; Wed, 24 Jul 2019 12:42:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=cVIht7oJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389678AbfGXTkl (ORCPT + 99 others); Wed, 24 Jul 2019 15:40:41 -0400 Received: from mail.kernel.org ([198.145.29.99]:41908 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390055AbfGXTkj (ORCPT ); Wed, 24 Jul 2019 15:40:39 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C0D4C217D4; Wed, 24 Jul 2019 19:40:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1563997239; bh=DTB3r0okTADZgCrlLgX7u41OqrFPxTv14aR6Y4l1Fp4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=cVIht7oJ+xJLlzxO6Z+kdyko8niNqvxvVfNB/y0ubKH7OcrRTqwo6SP7Jzn9p+Tku UR1pQls35BkeNfAylWM8BeSFh1fIJMloXRclVyPEkkPV1lwgJHS5wEmTYpFH6bSei5 2l5JR7oCULWR429TlNvi1Kfw1U7dX6P4Xv46PHXE= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Filipe Manana , David Sterba Subject: [PATCH 5.2 369/413] Btrfs: fix fsync not persisting dentry deletions due to inode evictions Date: Wed, 24 Jul 2019 21:21:00 +0200 Message-Id: <20190724191801.735580852@linuxfoundation.org> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190724191735.096702571@linuxfoundation.org> References: <20190724191735.096702571@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Filipe Manana commit 803f0f64d17769071d7287d9e3e3b79a3e1ae937 upstream. In order to avoid searches on a log tree when unlinking an inode, we check if the inode being unlinked was logged in the current transaction, as well as the inode of its parent directory. When any of the inodes are logged, we proceed to delete directory items and inode reference items from the log, to ensure that if a subsequent fsync of only the inode being unlinked or only of the parent directory when the other is not fsync'ed as well, does not result in the entry still existing after a power failure. That check however is not reliable when one of the inodes involved (the one being unlinked or its parent directory's inode) is evicted, since the logged_trans field is transient, that is, it is not stored on disk, so it is lost when the inode is evicted and loaded into memory again (which is set to zero on load). As a consequence the checks currently being done by btrfs_del_dir_entries_in_log() and btrfs_del_inode_ref_in_log() always return true if the inode was evicted before, regardless of the inode having been logged or not before (and in the current transaction), this results in the dentry being unlinked still existing after a log replay if after the unlink operation only one of the inodes involved is fsync'ed. Example: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir /mnt/dir $ touch /mnt/dir/foo $ xfs_io -c fsync /mnt/dir/foo # Keep an open file descriptor on our directory while we evict inodes. # We just want to evict the file's inode, the directory's inode must not # be evicted. $ ( cd /mnt/dir; while true; do :; done ) & $ pid=$! # Wait a bit to give time to background process to chdir to our test # directory. $ sleep 0.5 # Trigger eviction of the file's inode. $ echo 2 > /proc/sys/vm/drop_caches # Unlink our file and fsync the parent directory. After a power failure # we don't expect to see the file anymore, since we fsync'ed the parent # directory. $ rm -f $SCRATCH_MNT/dir/foo $ xfs_io -c fsync /mnt/dir $ mount /dev/sdb /mnt $ ls /mnt/dir foo $ --> file still there, unlink not persisted despite explicit fsync on dir Fix this by checking if the inode has the full_sync bit set in its runtime flags as well, since that bit is set everytime an inode is loaded from disk, or for other less common cases such as after a shrinking truncate or failure to allocate extent maps for holes, and gets cleared after the first fsync. Also consider the inode as possibly logged only if it was last modified in the current transaction (besides having the full_fsync flag set). Fixes: 3a5f1d458ad161 ("Btrfs: Optimize btree walking while logging inodes") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/tree-log.c | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3323,6 +3323,30 @@ int btrfs_free_log_root_tree(struct btrf } /* + * Check if an inode was logged in the current transaction. We can't always rely + * on an inode's logged_trans value, because it's an in-memory only field and + * therefore not persisted. This means that its value is lost if the inode gets + * evicted and loaded again from disk (in which case it has a value of 0, and + * certainly it is smaller then any possible transaction ID), when that happens + * the full_sync flag is set in the inode's runtime flags, so on that case we + * assume eviction happened and ignore the logged_trans value, assuming the + * worst case, that the inode was logged before in the current transaction. + */ +static bool inode_logged(struct btrfs_trans_handle *trans, + struct btrfs_inode *inode) +{ + if (inode->logged_trans == trans->transid) + return true; + + if (inode->last_trans == trans->transid && + test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags) && + !test_bit(BTRFS_FS_LOG_RECOVERING, &trans->fs_info->flags)) + return true; + + return false; +} + +/* * If both a file and directory are logged, and unlinks or renames are * mixed in, we have a few interesting corners: * @@ -3356,7 +3380,7 @@ int btrfs_del_dir_entries_in_log(struct int bytes_del = 0; u64 dir_ino = btrfs_ino(dir); - if (dir->logged_trans < trans->transid) + if (!inode_logged(trans, dir)) return 0; ret = join_running_log_trans(root); @@ -3460,7 +3484,7 @@ int btrfs_del_inode_ref_in_log(struct bt u64 index; int ret; - if (inode->logged_trans < trans->transid) + if (!inode_logged(trans, inode)) return 0; ret = join_running_log_trans(root);