Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp2421662rwi; Tue, 11 Oct 2022 08:34:26 -0700 (PDT) X-Google-Smtp-Source: AMsMyM63fL60yN+cQVlekQj8aBseSWL618WB3EkO3ynhwYReRmS/2qTV7UW8VaCpxby2ThUxWk8X X-Received: by 2002:a17:90b:33d0:b0:20d:5edf:8b92 with SMTP id lk16-20020a17090b33d000b0020d5edf8b92mr6683763pjb.117.1665502465825; Tue, 11 Oct 2022 08:34:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665502465; cv=none; d=google.com; s=arc-20160816; b=ozYVZMNk2DQ+24t4vryQVeY17r1k9i4RKrChRyFyCJd2MSuNEvl578+BMgJVZELe8m DLRvx6adwUb+Gs5xJAh/eFwEUfDn/tPozkebwcR7cFJqUkKfyLzxENjE4TbBeujzQWUj ZsD/PIZ3XlpxuP2+PSVhiA3FrLu/748QBCcPmNe6ABgQYGKu16rgJADC2mk5nxu5gB/A 2Ar6CW3ucWTggGtlJZQthjJg3Ugg0EwBYsjcRe22RVSPt671KEKb6dbLhUwa3Fk8Mmsf n2i1FLm0iZpS6F3lmIb6S4IZc2Udl96CRKTExbSNlny7+BsRk5C/STLpV8cxvus51ARd fFUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=naXRTu6GbaBjr6XjmsnS/U1ngpb1TzcdcMX+UAYW8jo=; b=YKDjbo7FqXmSp/mA0SJolNnZl1y/Bh2chcMWXSieUE2ZJ9HhUQ5EFg+oDzaT96dzJH R+Wfp1TnpLuoopPXbYCjJwUPELbIr+TjVpke7bB0EoNUgQ1pbyphwiYjs4T9WY5NeoR+ iPvsQ6JO5NHw+E6P5T5hCIT83UArXoIHy/UqesPFDCNTB59UucaTIBCoLWxQ2cipJzUP +y+pbTyG8KiLpontYhO7/2cq0mv6WaTYVHrJxd4H3OITAec10gTgIFQq8vwbx2Vb83H2 jQhCSJxMpHKezuEudrtn/13w7GqB5FctNT2UAuCrgHNDQkjV3UhR/8xltUn8F13CMvcc 89iA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=ld9fh0W+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a9-20020aa78e89000000b00538147a4b3csi13760380pfr.318.2022.10.11.08.34.14; Tue, 11 Oct 2022 08:34:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=ld9fh0W+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230344AbiJKOxv (ORCPT + 99 others); Tue, 11 Oct 2022 10:53:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35784 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230210AbiJKOwd (ORCPT ); Tue, 11 Oct 2022 10:52:33 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 829A39AFBD; Tue, 11 Oct 2022 07:51:15 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id DD00C611E3; Tue, 11 Oct 2022 14:51:12 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3BB07C433D6; Tue, 11 Oct 2022 14:51:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665499872; bh=RuO3TiQT3MPZfe0XWNNRoVD5zZTWmMA9MKqHD83CPlE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ld9fh0W+urmg8hbLuyypP9Bfl8DnqkzuXXD/FH8LfZNAOLm+pRISW60660vTEoguI CpjOM1Ab2fa+JWKTboKEYckWbKUz/0MC1ON5vV5gvLcYsKMC4Df+7Q+gcPdRW7bLlB rhv39qC4uBWfFXtVtalTBmhZoQlnhbuOvAfoVzLvmzjFPlmUSezo7Y7V3nubLjQJ2W 9KzJlcx5nwmCHV3FWZkPK25unwLdQRuRgmF7MJwKWL7m3fjSawTotaaVceFQp7q3me ioePcCfdfEFSPlxIRBO7kJ1fc1dl3CRNDDPL1cCV2X8u5/GTxmtXRgKTu8VH2bmmd/ fDBJWCfUBnyFA== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Qu Wenruo , Goffredo Baroncelli , Anand Jain , David Sterba , Sasha Levin , clm@fb.com, josef@toxicpanda.com, linux-btrfs@vger.kernel.org Subject: [PATCH AUTOSEL 6.0 36/46] btrfs: check superblock to ensure the fs was not modified at thaw time Date: Tue, 11 Oct 2022 10:50:04 -0400 Message-Id: <20221011145015.1622882-36-sashal@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221011145015.1622882-1-sashal@kernel.org> References: <20221011145015.1622882-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Qu Wenruo [ Upstream commit a05d3c9153145283ce9c58a1d7a9056fbb85f6a1 ] [BACKGROUND] There is an incident report that, one user hibernated the system, with one btrfs on removable device still mounted. Then by some incident, the btrfs got mounted and modified by another system/OS, then back to the hibernated system. After resuming from the hibernation, new write happened into the victim btrfs. Now the fs is completely broken, since the underlying btrfs is no longer the same one before the hibernation, and the user lost their data due to various transid mismatch. [REPRODUCER] We can emulate the situation using the following small script: truncate -s 1G $dev mkfs.btrfs -f $dev mount $dev $mnt fsstress -w -d $mnt -n 500 sync xfs_freeze -f $mnt cp $dev $dev.backup # There is no way to mount the same cloned fs on the same system, # as the conflicting fsid will be rejected by btrfs. # Thus here we have to wipe the fs using a different btrfs. mkfs.btrfs -f $dev.backup dd if=$dev.backup of=$dev bs=1M xfs_freeze -u $mnt fsstress -w -d $mnt -n 20 umount $mnt btrfs check $dev The final fsck will fail due to some tree blocks has incorrect fsid. This is enough to emulate the problem hit by the unfortunate user. [ENHANCEMENT] Although such case should not be that common, it can still happen from time to time. From the view of btrfs, we can detect any unexpected super block change, and if there is any unexpected change, we just mark the fs read-only, and thaw the fs. By this we can limit the damage to minimal, and I hope no one would lose their data by this anymore. Suggested-by: Goffredo Baroncelli Link: https://lore.kernel.org/linux-btrfs/83bf3b4b-7f4c-387a-b286-9251e3991e34@bluemole.com/ Reviewed-by: Anand Jain Signed-off-by: Qu Wenruo Signed-off-by: David Sterba Signed-off-by: Sasha Levin --- fs/btrfs/disk-io.c | 25 ++++++++++++++----- fs/btrfs/disk-io.h | 4 +++- fs/btrfs/super.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.c | 2 +- 4 files changed, 83 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e0e1730e67d7..d9881b54efd1 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2600,8 +2600,8 @@ static int btrfs_read_roots(struct btrfs_fs_info *fs_info) * 1, 2 2nd and 3rd backup copy * -1 skip bytenr check */ -static int validate_super(struct btrfs_fs_info *fs_info, - struct btrfs_super_block *sb, int mirror_num) +int btrfs_validate_super(struct btrfs_fs_info *fs_info, + struct btrfs_super_block *sb, int mirror_num) { u64 nodesize = btrfs_super_nodesize(sb); u64 sectorsize = btrfs_super_sectorsize(sb); @@ -2785,7 +2785,7 @@ static int validate_super(struct btrfs_fs_info *fs_info, */ static int btrfs_validate_mount_super(struct btrfs_fs_info *fs_info) { - return validate_super(fs_info, fs_info->super_copy, 0); + return btrfs_validate_super(fs_info, fs_info->super_copy, 0); } /* @@ -2799,7 +2799,7 @@ static int btrfs_validate_write_super(struct btrfs_fs_info *fs_info, { int ret; - ret = validate_super(fs_info, sb, -1); + ret = btrfs_validate_super(fs_info, sb, -1); if (ret < 0) goto out; if (!btrfs_supported_super_csum(btrfs_super_csum_type(sb))) { @@ -3846,7 +3846,7 @@ static void btrfs_end_super_write(struct bio *bio) } struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev, - int copy_num) + int copy_num, bool drop_cache) { struct btrfs_super_block *super; struct page *page; @@ -3864,6 +3864,19 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev, if (bytenr + BTRFS_SUPER_INFO_SIZE >= bdev_nr_bytes(bdev)) return ERR_PTR(-EINVAL); + if (drop_cache) { + /* This should only be called with the primary sb. */ + ASSERT(copy_num == 0); + + /* + * Drop the page of the primary superblock, so later read will + * always read from the device. + */ + invalidate_inode_pages2_range(mapping, + bytenr >> PAGE_SHIFT, + (bytenr + BTRFS_SUPER_INFO_SIZE) >> PAGE_SHIFT); + } + page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS); if (IS_ERR(page)) return ERR_CAST(page); @@ -3895,7 +3908,7 @@ struct btrfs_super_block *btrfs_read_dev_super(struct block_device *bdev) * later supers, using BTRFS_SUPER_MIRROR_MAX instead */ for (i = 0; i < 1; i++) { - super = btrfs_read_dev_one_super(bdev, i); + super = btrfs_read_dev_one_super(bdev, i, false); if (IS_ERR(super)) continue; diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index 47ad8e0a2d33..aef981de672c 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -46,10 +46,12 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices, char *options); void __cold close_ctree(struct btrfs_fs_info *fs_info); +int btrfs_validate_super(struct btrfs_fs_info *fs_info, + struct btrfs_super_block *sb, int mirror_num); int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors); struct btrfs_super_block *btrfs_read_dev_super(struct block_device *bdev); struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev, - int copy_num); + int copy_num, bool drop_cache); int btrfs_commit_super(struct btrfs_fs_info *fs_info); struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root, struct btrfs_key *key); diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index f1c6ca59299e..692983e69ba1 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2553,11 +2553,71 @@ static int btrfs_freeze(struct super_block *sb) return btrfs_commit_transaction(trans); } +static int check_dev_super(struct btrfs_device *dev) +{ + struct btrfs_fs_info *fs_info = dev->fs_info; + struct btrfs_super_block *sb; + int ret = 0; + + /* This should be called with fs still frozen. */ + ASSERT(test_bit(BTRFS_FS_FROZEN, &fs_info->flags)); + + /* Missing dev, no need to check. */ + if (!dev->bdev) + return 0; + + /* Only need to check the primary super block. */ + sb = btrfs_read_dev_one_super(dev->bdev, 0, true); + if (IS_ERR(sb)) + return PTR_ERR(sb); + + /* Btrfs_validate_super() includes fsid check against super->fsid. */ + ret = btrfs_validate_super(fs_info, sb, 0); + if (ret < 0) + goto out; + + if (btrfs_super_generation(sb) != fs_info->last_trans_committed) { + btrfs_err(fs_info, "transid mismatch, has %llu expect %llu", + btrfs_super_generation(sb), + fs_info->last_trans_committed); + ret = -EUCLEAN; + goto out; + } +out: + btrfs_release_disk_super(sb); + return ret; +} + static int btrfs_unfreeze(struct super_block *sb) { struct btrfs_fs_info *fs_info = btrfs_sb(sb); + struct btrfs_device *device; + int ret = 0; + /* + * Make sure the fs is not changed by accident (like hibernation then + * modified by other OS). + * If we found anything wrong, we mark the fs error immediately. + * + * And since the fs is frozen, no one can modify the fs yet, thus + * we don't need to hold device_list_mutex. + */ + list_for_each_entry(device, &fs_info->fs_devices->devices, dev_list) { + ret = check_dev_super(device); + if (ret < 0) { + btrfs_handle_fs_error(fs_info, ret, + "super block on devid %llu got modified unexpectedly", + device->devid); + break; + } + } clear_bit(BTRFS_FS_FROZEN, &fs_info->flags); + + /* + * We still return 0, to allow VFS layer to unfreeze the fs even the + * above checks failed. Since the fs is either fine or read-only, we're + * safe to continue, without causing further damage. + */ return 0; } diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f63ff91e2883..b4df6f74855c 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2017,7 +2017,7 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info, struct page *page; int ret; - disk_super = btrfs_read_dev_one_super(bdev, copy_num); + disk_super = btrfs_read_dev_one_super(bdev, copy_num, false); if (IS_ERR(disk_super)) continue; -- 2.35.1