Received: by 2002:a25:b794:0:0:0:0:0 with SMTP id n20csp2752374ybh; Mon, 5 Aug 2019 06:17:54 -0700 (PDT) X-Google-Smtp-Source: APXvYqyKVvT3qSwqTlhTWbxAIhTsL6xabKuC3CLcPsDJ4hh17JcRPajFlLa+lT+m24ndH8UXiG75 X-Received: by 2002:a62:1b0c:: with SMTP id b12mr71941783pfb.17.1565011074047; Mon, 05 Aug 2019 06:17:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565011074; cv=none; d=google.com; s=arc-20160816; b=GDT+VliqzypgEHyBVDBK3XChpdDpQ76V2/qFSPWfKBTmIBKfq9gCEFTu4oa/Hpy7sx QOlzAyQLvyXIkcFfmM4/PzBXv6z6K+kPgXJ4Cf34JWwbjeGggWt29gQdnm3wfXWUVJu0 3mk+gg58Qy/3aAi1Cmnjj0qOoR7ilQEyTPSeiE80bVoKKtXmQcsYc6WnnQUxgX/suPAM 6zWl4Q/YKFcUSk3yyK/bt4hc01/L8+D/Hpsaww5Yvw7G2Y+bBt06CFjJFutCmD3piNjC xbPYHQ7rJBdX+Xbd/S/t4tHecmfg4SY3/sRWwIKhmYFQYZ7mhDVl2/dEg4nPVuz2vWyJ z+7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=7nZNIblO/C3Rogzhm8hYeesxPH7t+UsQjzDseXzDu4E=; b=Jo78vtL7fZ3OFtmik2Hy4lrqXp7AbeBy0lBI+TH80rgWKLV1GcsmaNxWiD0kplblaK IMGMJjG8ewMIE4dRjqRZJeoLwVyjobZ2AY48hN8Naa79SCMdlcNizmlwJuQyjw1qRzIR fQNm7voAnUe7pAG3XgCcMQQYmIgaITHNn7tADMPbu3eNNWv+/7Xu/VTJgAXpuU/55h1C IoSUejobP6CSr4Dbu+klm14Y7XlpepMA3RPG52l94HJ0WTMistMtAZz32Xc71BlaEBMi mKMF0Ywt/EAPJTIFCh2ILEqusb/X1J47522SNY01V2vWjJQVBTKI03ulUKg4/tknh4IC RotA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=hZ1pe4ZC; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 65si71258629ple.240.2019.08.05.06.17.38; Mon, 05 Aug 2019 06:17:54 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=hZ1pe4ZC; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729379AbfHENFl (ORCPT + 99 others); Mon, 5 Aug 2019 09:05:41 -0400 Received: from mail.kernel.org ([198.145.29.99]:41698 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729347AbfHENFh (ORCPT ); Mon, 5 Aug 2019 09:05:37 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 37C5B2087B; Mon, 5 Aug 2019 13:05:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1565010335; bh=gu6bnjT3jRenRUZJPuBNBySktdtTLVOi4IBMKzw402I=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hZ1pe4ZC9f6vkLItXfv3L7avdwoNT4LrznbMiLKO7qkUs7MrrIxo2uTRreSyxuMLJ MINazNC3KLEC5uanAu0itKKNN00rVklcA1cZ2ACuGYwrQbuPg/0l0J15FkfgDsR7Pz cQbX6gmV0uEX+cnoE6TIqHPjb1GD/cwJT4r4F8bw= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Filipe Manana , David Sterba Subject: [PATCH 4.9 27/42] Btrfs: fix incremental send failure after deduplication Date: Mon, 5 Aug 2019 15:02:53 +0200 Message-Id: <20190805124928.190609239@linuxfoundation.org> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190805124924.788666484@linuxfoundation.org> References: <20190805124924.788666484@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Filipe Manana commit b4f9a1a87a48c255bb90d8a6c3d555a1abb88130 upstream. When doing an incremental send operation we can fail if we previously did deduplication operations against a file that exists in both snapshots. In that case we will fail the send operation with -EIO and print a message to dmesg/syslog like the following: BTRFS error (device sdc): Send: inconsistent snapshot, found updated \ extent for inode 257 without updated inode item, send root is 258, \ parent root is 257 This requires that we deduplicate to the same file in both snapshots for the same amount of times on each snapshot. The issue happens because a deduplication only updates the iversion of an inode and does not update any other field of the inode, therefore if we deduplicate the file on each snapshot for the same amount of time, the inode will have the same iversion value (stored as the "sequence" field on the inode item) on both snapshots, therefore it will be seen as unchanged between in the send snapshot while there are new/updated/deleted extent items when comparing to the parent snapshot. This makes the send operation return -EIO and print an error message. Example reproducer: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt # Create our first file. The first half of the file has several 64Kb # extents while the second half as a single 512Kb extent. $ xfs_io -f -s -c "pwrite -S 0xb8 -b 64K 0 512K" /mnt/foo $ xfs_io -c "pwrite -S 0xb8 512K 512K" /mnt/foo # Create the base snapshot and the parent send stream from it. $ btrfs subvolume snapshot -r /mnt /mnt/mysnap1 $ btrfs send -f /tmp/1.snap /mnt/mysnap1 # Create our second file, that has exactly the same data as the first # file. $ xfs_io -f -c "pwrite -S 0xb8 0 1M" /mnt/bar # Create the second snapshot, used for the incremental send, before # doing the file deduplication. $ btrfs subvolume snapshot -r /mnt /mnt/mysnap2 # Now before creating the incremental send stream: # # 1) Deduplicate into a subrange of file foo in snapshot mysnap1. This # will drop several extent items and add a new one, also updating # the inode's iversion (sequence field in inode item) by 1, but not # any other field of the inode; # # 2) Deduplicate into a different subrange of file foo in snapshot # mysnap2. This will replace an extent item with a new one, also # updating the inode's iversion by 1 but not any other field of the # inode. # # After these two deduplication operations, the inode items, for file # foo, are identical in both snapshots, but we have different extent # items for this inode in both snapshots. We want to check this doesn't # cause send to fail with an error or produce an incorrect stream. $ xfs_io -r -c "dedupe /mnt/bar 0 0 512K" /mnt/mysnap1/foo $ xfs_io -r -c "dedupe /mnt/bar 512K 512K 512K" /mnt/mysnap2/foo # Create the incremental send stream. $ btrfs send -p /mnt/mysnap1 -f /tmp/2.snap /mnt/mysnap2 ERROR: send ioctl failed with -5: Input/output error This issue started happening back in 2015 when deduplication was updated to not update the inode's ctime and mtime and update only the iversion. Back then we would hit a BUG_ON() in send, but later in 2016 send was updated to return -EIO and print the error message instead of doing the BUG_ON(). A test case for fstests follows soon. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203933 Fixes: 1c919a5e13702c ("btrfs: don't update mtime/ctime on deduped inodes") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/send.c | 77 ++++++++++---------------------------------------------- 1 file changed, 15 insertions(+), 62 deletions(-) --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -5835,68 +5835,21 @@ static int changed_extent(struct send_ct { int ret = 0; - if (sctx->cur_ino != sctx->cmp_key->objectid) { - - if (result == BTRFS_COMPARE_TREE_CHANGED) { - struct extent_buffer *leaf_l; - struct extent_buffer *leaf_r; - struct btrfs_file_extent_item *ei_l; - struct btrfs_file_extent_item *ei_r; - - leaf_l = sctx->left_path->nodes[0]; - leaf_r = sctx->right_path->nodes[0]; - ei_l = btrfs_item_ptr(leaf_l, - sctx->left_path->slots[0], - struct btrfs_file_extent_item); - ei_r = btrfs_item_ptr(leaf_r, - sctx->right_path->slots[0], - struct btrfs_file_extent_item); - - /* - * We may have found an extent item that has changed - * only its disk_bytenr field and the corresponding - * inode item was not updated. This case happens due to - * very specific timings during relocation when a leaf - * that contains file extent items is COWed while - * relocation is ongoing and its in the stage where it - * updates data pointers. So when this happens we can - * safely ignore it since we know it's the same extent, - * but just at different logical and physical locations - * (when an extent is fully replaced with a new one, we - * know the generation number must have changed too, - * since snapshot creation implies committing the current - * transaction, and the inode item must have been updated - * as well). - * This replacement of the disk_bytenr happens at - * relocation.c:replace_file_extents() through - * relocation.c:btrfs_reloc_cow_block(). - */ - if (btrfs_file_extent_generation(leaf_l, ei_l) == - btrfs_file_extent_generation(leaf_r, ei_r) && - btrfs_file_extent_ram_bytes(leaf_l, ei_l) == - btrfs_file_extent_ram_bytes(leaf_r, ei_r) && - btrfs_file_extent_compression(leaf_l, ei_l) == - btrfs_file_extent_compression(leaf_r, ei_r) && - btrfs_file_extent_encryption(leaf_l, ei_l) == - btrfs_file_extent_encryption(leaf_r, ei_r) && - btrfs_file_extent_other_encoding(leaf_l, ei_l) == - btrfs_file_extent_other_encoding(leaf_r, ei_r) && - btrfs_file_extent_type(leaf_l, ei_l) == - btrfs_file_extent_type(leaf_r, ei_r) && - btrfs_file_extent_disk_bytenr(leaf_l, ei_l) != - btrfs_file_extent_disk_bytenr(leaf_r, ei_r) && - btrfs_file_extent_disk_num_bytes(leaf_l, ei_l) == - btrfs_file_extent_disk_num_bytes(leaf_r, ei_r) && - btrfs_file_extent_offset(leaf_l, ei_l) == - btrfs_file_extent_offset(leaf_r, ei_r) && - btrfs_file_extent_num_bytes(leaf_l, ei_l) == - btrfs_file_extent_num_bytes(leaf_r, ei_r)) - return 0; - } - - inconsistent_snapshot_error(sctx, result, "extent"); - return -EIO; - } + /* + * We have found an extent item that changed without the inode item + * having changed. This can happen either after relocation (where the + * disk_bytenr of an extent item is replaced at + * relocation.c:replace_file_extents()) or after deduplication into a + * file in both the parent and send snapshots (where an extent item can + * get modified or replaced with a new one). Note that deduplication + * updates the inode item, but it only changes the iversion (sequence + * field in the inode item) of the inode, so if a file is deduplicated + * the same amount of times in both the parent and send snapshots, its + * iversion becames the same in both snapshots, whence the inode item is + * the same on both snapshots. + */ + if (sctx->cur_ino != sctx->cmp_key->objectid) + return 0; if (!sctx->cur_inode_new_gen && !sctx->cur_inode_deleted) { if (result != BTRFS_COMPARE_TREE_DELETED)