Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3733622imu; Mon, 7 Jan 2019 08:29:56 -0800 (PST) X-Google-Smtp-Source: ALg8bN6ND85Bngn6icNVLwbsb059xlz53CYjGoE3lJjxkQ9gkQhToMnb9EY5REIqea2lTIWSGFRp X-Received: by 2002:a63:6506:: with SMTP id z6mr11494009pgb.334.1546878596838; Mon, 07 Jan 2019 08:29:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546878596; cv=none; d=google.com; s=arc-20160816; b=Hopi2pgY7BDezxZrRheGm4F5sstT1f0WmUJAk0792AbWGRH4SOBvkQBC0zemk5BT4n MVT6oyM7pZwJZiv1uyR3DaKLKy9zpwEGglAiS4cqVLgT3ZNV6+2uvYAE0qT7nhKaX8n5 js1We9R89hVArUUYXcxKCOAxGBPJJAVlX3AjmBbn/EfFLJg0ft0i+r7GMcUK0Qa3FQuB cokVif0L0Whoq1VU9CeUEW9be6tn2kpKYfPbNb6dXAA6fgfHiQqeUAKI07gYxj6t4TVF FVsCWB9+zg6BLIAzdvQjkZrErshObDQMmlcjUgDjneaUNgkdJOqSbpnBKHnOBQC4wLfN PvCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=H+6MIdpNjw1v9lae0gd3mWOLYJ0soGe5xm7F6+iSiA8=; b=cV+sdKlnH1Q2ZG5+VDH7c5iq8o7/079x7ANrXhscdQi1uUbsOS70y4nnBMbhuihAXy hLCU0N5T5qa73k4KuskFNAPvyaZCL0j94IEh6XUkr6CJ4TLjB8l6GkFsk91bcxKjboO5 gIHnusuwTZf04Kp398ZMEKgCm8igU32ksQbhF7XC6H0p+5w88yyC1Poja7crvFHgztmA GNHj0fAxbX0D1jiByhXaWbnvnVeJPTyV/Aomr2h+JivAjO52/zSNPTMbdSt0q37ud2kr OKw3ZFKMrLLqg1eyqQLpl8yfaz3+AwIKpb8jgoakEk+UBgabg5U/MIG8mDDkMeVQloJT B5+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=ybBEXTLb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u9si15768445plr.157.2019.01.07.08.29.42; Mon, 07 Jan 2019 08:29:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=ybBEXTLb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729077AbfAGNWV (ORCPT + 99 others); Mon, 7 Jan 2019 08:22:21 -0500 Received: from mail.kernel.org ([198.145.29.99]:32794 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728722AbfAGMoV (ORCPT ); Mon, 7 Jan 2019 07:44:21 -0500 Received: from localhost (5356596B.cm-6-7b.dynamic.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 16BB020665; Mon, 7 Jan 2019 12:44:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1546865059; bh=U7CtIwnvf2qH9o3JLljjes70nUP+Dc9uGGaOE0fErlI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ybBEXTLb9bVlMBN/MfNUuUkeKsS60t5UsSvUu/1p7oUz3jVHYogdy3//3GATuBlEc 6B9DRfL6QYNxjTiF5V0B++PJNqdWudA/oZDOvoqOUgY6jU1azFxKcSf3s+DVEElM51 wnzmn6ss7oEaFIA1bpCPQEwaal2URcl+vl1GkGq0= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Vijay Chidambaram , Jayashree Mohan , Filipe Manana , David Sterba Subject: [PATCH 4.20 096/145] Btrfs: fix fsync of files with multiple hard links in new directories Date: Mon, 7 Jan 2019 13:32:13 +0100 Message-Id: <20190107104449.786671798@linuxfoundation.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190107104437.308206189@linuxfoundation.org> References: <20190107104437.308206189@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review X-Patchwork-Hint: ignore MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.20-stable review patch. If anyone has any objections, please let me know. ------------------ From: Filipe Manana commit 41bd60676923822de1df2c50b3f9a10171f4338a upstream. The log tree has a long standing problem that when a file is fsync'ed we only check for new ancestors, created in the current transaction, by following only the hard link for which the fsync was issued. We follow the ancestors using the VFS' dget_parent() API. This means that if we create a new link for a file in a directory that is new (or in an any other new ancestor directory) and then fsync the file using an old hard link, we end up not logging the new ancestor, and on log replay that new hard link and ancestor do not exist. In some cases, involving renames, the file will not exist at all. Example: mkfs.btrfs -f /dev/sdb mount /dev/sdb /mnt mkdir /mnt/A touch /mnt/foo ln /mnt/foo /mnt/A/bar xfs_io -c fsync /mnt/foo In this example after log replay only the hard link named 'foo' exists and directory A does not exist, which is unexpected. In other major linux filesystems, such as ext4, xfs and f2fs for example, both hard links exist and so does directory A after mounting again the filesystem. Checking if any new ancestors are new and need to be logged was added in 2009 by commit 12fcfd22fe5b ("Btrfs: tree logging unlink/rename fixes"), however only for the ancestors of the hard link (dentry) for which the fsync was issued, instead of checking for all ancestors for all of the inode's hard links. So fix this by tracking the id of the last transaction where a hard link was created for an inode and then on fsync fallback to a full transaction commit when an inode has more than one hard link and at least one new hard link was created in the current transaction. This is the simplest solution since this is not a common use case (adding frequently hard links for which there's an ancestor created in the current transaction and then fsync the file). In case it ever becomes a common use case, a solution that consists of iterating the fs/subvol btree for each hard link and check if any ancestor is new, could be implemented. This solves many unexpected scenarios reported by Jayashree Mohan and Vijay Chidambaram, and for which there is a new test case for fstests under review. Fixes: 12fcfd22fe5b ("Btrfs: tree logging unlink/rename fixes") CC: stable@vger.kernel.org # 4.4+ Reported-by: Vijay Chidambaram Reported-by: Jayashree Mohan Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/btrfs_inode.h | 6 ++++++ fs/btrfs/inode.c | 17 +++++++++++++++++ fs/btrfs/tree-log.c | 16 ++++++++++++++++ 3 files changed, 39 insertions(+) --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -147,6 +147,12 @@ struct btrfs_inode { u64 last_unlink_trans; /* + * Track the transaction id of the last transaction used to create a + * hard link for the inode. This is used by the log tree (fsync). + */ + u64 last_link_trans; + + /* * Number of bytes outstanding that are going to need csums. This is * used in ENOSPC accounting. */ --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3687,6 +3687,21 @@ cache_index: * inode is not a directory, logging its parent unnecessarily. */ BTRFS_I(inode)->last_unlink_trans = BTRFS_I(inode)->last_trans; + /* + * Similar reasoning for last_link_trans, needs to be set otherwise + * for a case like the following: + * + * mkdir A + * touch foo + * ln foo A/bar + * echo 2 > /proc/sys/vm/drop_caches + * fsync foo + * + * + * Would result in link bar and directory A not existing after the power + * failure. + */ + BTRFS_I(inode)->last_link_trans = BTRFS_I(inode)->last_trans; path->slots[0]++; if (inode->i_nlink != 1 || @@ -6626,6 +6641,7 @@ static int btrfs_link(struct dentry *old if (err) goto fail; } + BTRFS_I(inode)->last_link_trans = trans->transid; d_instantiate(dentry, inode); ret = btrfs_log_new_name(trans, BTRFS_I(inode), NULL, parent, true, NULL); @@ -9158,6 +9174,7 @@ struct inode *btrfs_alloc_inode(struct s ei->index_cnt = (u64)-1; ei->dir_index = 0; ei->last_unlink_trans = 0; + ei->last_link_trans = 0; ei->last_log_commit = 0; spin_lock_init(&ei->lock); --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -5778,6 +5778,22 @@ static int btrfs_log_inode_parent(struct goto end_trans; } + /* + * If a new hard link was added to the inode in the current transaction + * and its link count is now greater than 1, we need to fallback to a + * transaction commit, otherwise we can end up not logging all its new + * parents for all the hard links. Here just from the dentry used to + * fsync, we can not visit the ancestor inodes for all the other hard + * links to figure out if any is new, so we fallback to a transaction + * commit (instead of adding a lot of complexity of scanning a btree, + * since this scenario is not a common use case). + */ + if (inode->vfs_inode.i_nlink > 1 && + inode->last_link_trans > last_committed) { + ret = -EMLINK; + goto end_trans; + } + while (1) { if (!parent || d_really_is_negative(parent) || sb != parent->d_sb) break;