Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp1831931imm; Mon, 3 Sep 2018 10:34:16 -0700 (PDT) X-Google-Smtp-Source: ANB0VdbTFaXmyJQ/ATNHp81/GTqR4heNHxBeiBRZ13+gwaYkPG0Lw9/ljgFlFYVj93qojVhh1Gk4 X-Received: by 2002:a62:3909:: with SMTP id g9-v6mr30343258pfa.176.1535996056683; Mon, 03 Sep 2018 10:34:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535996056; cv=none; d=google.com; s=arc-20160816; b=lcZKn3cx3Vjq+7+XG5GAp6hUMJSK31P4sbj3iwBKqJq+jBkn28Knbgzlew0zl8wQAl yzPMTUpuGXRqjFm+60HKvElbq3/cnqUkNcuIvjIwl8Upnxz8rs66FWmdMKXvg7XX7wAI ttMXGItXAAjojIvRixofrrD2WLKXLGeLHafGH3kJoIw9DRkhUm6BVeNVx7v+p7u8ddYs iaMRJy6SXFM1ztUWaRkhLf0g2hZGx/CLUs41LUagu6FrMiuxvHj2M1KHAVbsvuRcmFRx NYNy/0K46qZ+zZ99mlK5Uhci/tgE2piITZ+qt+X/ZhYfxzcPYANkH92qgEhe86nZ3Oy0 XQpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=CkGYAAFqr2MEx8AaYa7oPp9QarduSmjDPS/wQq7Bd2Q=; b=H+HRhLrrftpdfU3RIBJxeRFtuzYagH6l6vnbCg7gNg0w/th3XNpKhoFpmUB39p0Vh7 sLK9JeIUPmyMVNo4ftBHrwITiZ0e2+GXJeEIQUZEO4JnCg9WzX0ipv3TnkeKhe8rt1nd hkqhpdWqiCmCam/V2krAXeGCzMbqoAfdQapycltB4yVUfUgAVGqOD1VWSbR+EkLd0ksk vNTrlWKoL1e3ciuMQa8DzZMgL5XtpyqJWe2nPZC00usd5HR3GqEcXW0aW5falTx1RRVI t5AEH6VdVxTblXi3DwytP03fvfY6F87fI1rU/nkdmrtNv6LbtLlOjQ6omYF+s6s1pxd4 MVzA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g128-v6si18769324pfc.339.2018.09.03.10.34.00; Mon, 03 Sep 2018 10:34:16 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731113AbeICVxO (ORCPT + 99 others); Mon, 3 Sep 2018 17:53:14 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:47288 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728303AbeICVxO (ORCPT ); Mon, 3 Sep 2018 17:53:14 -0400 Received: from localhost (ip-213-127-74-90.ip.prioritytelecom.net [213.127.74.90]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 91B61D18; Mon, 3 Sep 2018 17:32:04 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Martin Wilck , Filipe Manana , David Sterba Subject: [PATCH 4.18 019/123] Btrfs: fix send failure when root has deleted files still open Date: Mon, 3 Sep 2018 18:56:03 +0200 Message-Id: <20180903165720.310967323@linuxfoundation.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180903165719.499675257@linuxfoundation.org> References: <20180903165719.499675257@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.18-stable review patch. If anyone has any objections, please let me know. ------------------ From: Filipe Manana commit 46b2f4590aab71d31088a265c86026b1e96c9de4 upstream. The more common use case of send involves creating a RO snapshot and then use it for a send operation. In this case it's not possible to have inodes in the snapshot that have a link count of zero (inode with an orphan item) since during snapshot creation we do the orphan cleanup. However, other less common use cases for send can end up seeing inodes with a link count of zero and in this case the send operation fails with a ENOENT error because any attempt to generate a path for the inode, with the purpose of creating it or updating it at the receiver, fails since there are no inode reference items. One use case it to use a regular subvolume for a send operation after turning it to RO mode or turning a RW snapshot into RO mode and then using it for a send operation. In both cases, if a file gets all its hard links deleted while there is an open file descriptor before turning the subvolume/snapshot into RO mode, the send operation will encounter an inode with a link count of zero and then fail with errno ENOENT. Example using a full send with a subvolume: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ btrfs subvolume create /mnt/sv1 $ touch /mnt/sv1/foo $ touch /mnt/sv1/bar # keep an open file descriptor on file bar $ exec 73> /mnt/sv1/bar $ btrfs subvolume snapshot -r /mnt/sv1 /mnt/snap2 # Turn the second snapshot to RW mode and delete file foo while # holding an open file descriptor on it. $ btrfs property set /mnt/snap2 ro false $ exec 73 Signed-off-by: Filipe Manana Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/send.c | 137 ++++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 129 insertions(+), 8 deletions(-) --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -100,6 +100,7 @@ struct send_ctx { u64 cur_inode_rdev; u64 cur_inode_last_extent; u64 cur_inode_next_write_offset; + bool ignore_cur_inode; u64 send_progress; @@ -5799,6 +5800,9 @@ static int finish_inode_if_needed(struct int pending_move = 0; int refs_processed = 0; + if (sctx->ignore_cur_inode) + return 0; + ret = process_recorded_refs_if_needed(sctx, at_end, &pending_move, &refs_processed); if (ret < 0) @@ -5917,6 +5921,93 @@ out: return ret; } +struct parent_paths_ctx { + struct list_head *refs; + struct send_ctx *sctx; +}; + +static int record_parent_ref(int num, u64 dir, int index, struct fs_path *name, + void *ctx) +{ + struct parent_paths_ctx *ppctx = ctx; + + return record_ref(ppctx->sctx->parent_root, dir, name, ppctx->sctx, + ppctx->refs); +} + +/* + * Issue unlink operations for all paths of the current inode found in the + * parent snapshot. + */ +static int btrfs_unlink_all_paths(struct send_ctx *sctx) +{ + LIST_HEAD(deleted_refs); + struct btrfs_path *path; + struct btrfs_key key; + struct parent_paths_ctx ctx; + int ret; + + path = alloc_path_for_send(); + if (!path) + return -ENOMEM; + + key.objectid = sctx->cur_ino; + key.type = BTRFS_INODE_REF_KEY; + key.offset = 0; + ret = btrfs_search_slot(NULL, sctx->parent_root, &key, path, 0, 0); + if (ret < 0) + goto out; + + ctx.refs = &deleted_refs; + ctx.sctx = sctx; + + while (true) { + struct extent_buffer *eb = path->nodes[0]; + int slot = path->slots[0]; + + if (slot >= btrfs_header_nritems(eb)) { + ret = btrfs_next_leaf(sctx->parent_root, path); + if (ret < 0) + goto out; + else if (ret > 0) + break; + continue; + } + + btrfs_item_key_to_cpu(eb, &key, slot); + if (key.objectid != sctx->cur_ino) + break; + if (key.type != BTRFS_INODE_REF_KEY && + key.type != BTRFS_INODE_EXTREF_KEY) + break; + + ret = iterate_inode_ref(sctx->parent_root, path, &key, 1, + record_parent_ref, &ctx); + if (ret < 0) + goto out; + + path->slots[0]++; + } + + while (!list_empty(&deleted_refs)) { + struct recorded_ref *ref; + + ref = list_first_entry(&deleted_refs, struct recorded_ref, list); + ret = send_unlink(sctx, ref->full_path); + if (ret < 0) + goto out; + fs_path_free(ref->full_path); + list_del(&ref->list); + kfree(ref); + } + ret = 0; +out: + btrfs_free_path(path); + if (ret) + __free_recorded_refs(&deleted_refs); + return ret; +} + static int changed_inode(struct send_ctx *sctx, enum btrfs_compare_tree_result result) { @@ -5931,6 +6022,7 @@ static int changed_inode(struct send_ctx sctx->cur_inode_new_gen = 0; sctx->cur_inode_last_extent = (u64)-1; sctx->cur_inode_next_write_offset = 0; + sctx->ignore_cur_inode = false; /* * Set send_progress to current inode. This will tell all get_cur_xxx @@ -5971,6 +6063,33 @@ static int changed_inode(struct send_ctx sctx->cur_inode_new_gen = 1; } + /* + * Normally we do not find inodes with a link count of zero (orphans) + * because the most common case is to create a snapshot and use it + * for a send operation. However other less common use cases involve + * using a subvolume and send it after turning it to RO mode just + * after deleting all hard links of a file while holding an open + * file descriptor against it or turning a RO snapshot into RW mode, + * keep an open file descriptor against a file, delete it and then + * turn the snapshot back to RO mode before using it for a send + * operation. So if we find such cases, ignore the inode and all its + * items completely if it's a new inode, or if it's a changed inode + * make sure all its previous paths (from the parent snapshot) are all + * unlinked and all other the inode items are ignored. + */ + if (result == BTRFS_COMPARE_TREE_NEW || + result == BTRFS_COMPARE_TREE_CHANGED) { + u32 nlinks; + + nlinks = btrfs_inode_nlink(sctx->left_path->nodes[0], left_ii); + if (nlinks == 0) { + sctx->ignore_cur_inode = true; + if (result == BTRFS_COMPARE_TREE_CHANGED) + ret = btrfs_unlink_all_paths(sctx); + goto out; + } + } + if (result == BTRFS_COMPARE_TREE_NEW) { sctx->cur_inode_gen = left_gen; sctx->cur_inode_new = 1; @@ -6309,15 +6428,17 @@ static int changed_cb(struct btrfs_path key->objectid == BTRFS_FREE_SPACE_OBJECTID) goto out; - if (key->type == BTRFS_INODE_ITEM_KEY) + if (key->type == BTRFS_INODE_ITEM_KEY) { ret = changed_inode(sctx, result); - else if (key->type == BTRFS_INODE_REF_KEY || - key->type == BTRFS_INODE_EXTREF_KEY) - ret = changed_ref(sctx, result); - else if (key->type == BTRFS_XATTR_ITEM_KEY) - ret = changed_xattr(sctx, result); - else if (key->type == BTRFS_EXTENT_DATA_KEY) - ret = changed_extent(sctx, result); + } else if (!sctx->ignore_cur_inode) { + if (key->type == BTRFS_INODE_REF_KEY || + key->type == BTRFS_INODE_EXTREF_KEY) + ret = changed_ref(sctx, result); + else if (key->type == BTRFS_XATTR_ITEM_KEY) + ret = changed_xattr(sctx, result); + else if (key->type == BTRFS_EXTENT_DATA_KEY) + ret = changed_extent(sctx, result); + } out: return ret;