Received: by 2002:a05:7412:8d1c:b0:fa:4c10:6cad with SMTP id bj28csp43655rdb; Tue, 16 Jan 2024 14:27:01 -0800 (PST) X-Google-Smtp-Source: AGHT+IGn84y/xWB3iCR5GPZpyagIQqJVAqjf7fR6fFWeXBe+oHa3b3EUN5JZbQr0fX4Vk3V91ufQ X-Received: by 2002:a05:6870:d108:b0:1fb:51e:bc17 with SMTP id e8-20020a056870d10800b001fb051ebc17mr10846277oac.30.1705444021057; Tue, 16 Jan 2024 14:27:01 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705444021; cv=pass; d=google.com; s=arc-20160816; b=YaSE6hY4Zef5Sz+xYyDO8PbLO8SZJ7GkvdYnTnAYxiswlsXoNKZ/mQM5440NIfcSe9 5n/JHHB87buofWiAPDwnoHeY7KJBxQhn5TzcQ1zg7jrdKvm/UUprPpE+COxHjekZtUpE VARtU2fQQ80llrCB6AZGEBW/bHGEjvwTZN2IG4bBq9PPSNZwOaWLmVzP/+I9K0bCXuAh ++auQ3qIbaKOOfdEonSsPjQOWZYSOZOo2t/E4ttcn797mdCJcNif5QbXvx2LM7drPtYD FmjyE7uQkxK30lSBdTb0Zf9FdTSL/b0cVyDzyb4Syr8dWwx286jDeFi/9vAt3XjUGWPi BHEA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=DwI4JmZbvmkGOfNLEd6lX3fbCTzP41L+p9fJ5shMl4o=; fh=9EoZ/7r9SqpL9V+T5hzCl0jtaj45F7ylPQ3ZnSTtitk=; b=aofvoNNJsNTha68DtHlOuVy/aoo5ctjBJeG36ocqAcr5IQkRrtyEGxiGT1y/a/0o/C ODFZIjP1O3C/p7Zg9DkWwMEQz5HGL9/ES2QA/d1psa2ABQaOMk/0/tgorkfrqK63eM6l EEUOUhOM790WQOXGQAM4zZ7RMAr7YxFib4gDXkuukLE8r/rZFBwhT6rwhypixpH9PjrU djo5Rb5XBGBHWp6DCfL/NGBTsUUy5WQiFmOgjAQ5urVZ70+oFkf6pM2oAQpCS7eK5WPX +3klESdgeMNMBfTUI/m2UfH88YpgXpM3X3YCdMQU6DdRxfAnCyYYUhpgfsUqDd8N4hJc GPHQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=CxbLFNbT; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-28229-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-28229-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id t11-20020a65408b000000b005ce418d5dd5si12052741pgp.145.2024.01.16.14.27.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Jan 2024 14:27:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-28229-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=CxbLFNbT; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-28229-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-28229-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id E8DA6B255CB for ; Tue, 16 Jan 2024 22:06:05 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5055767A1C; Tue, 16 Jan 2024 20:05:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CxbLFNbT" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5618467A01; Tue, 16 Jan 2024 20:05:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705435505; cv=none; b=Dt/B+6JY6pQuRoQxy48tzzYBSqW2/ZglvYv77j7K4F3UgJxNeKQCOjkz2RjbrhAYN99BAR1qCDhGOASXSN8G0WuJNcvEnjbRKQ78UHvWenYj+Vj6M5lf8xsyi8oyjR+LxzBDsBwpO2IetTd3ztF8HRfpGTOzPLyh3xibTiF6sww= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705435505; c=relaxed/simple; bh=ojpqifFXAOWVDVr6GK2OGx/tKy+EZa6kBJ78lRV0shg=; h=Received:DKIM-Signature:From:To:Cc:Subject:Date:Message-ID: X-Mailer:In-Reply-To:References:MIME-Version:X-stable: X-Patchwork-Hint:X-stable-base:Content-Transfer-Encoding; b=EBr5gjjeHQG/qM04067nKNv59pVJHj0szaCc2oyKbCBsx3PYo2JLQAI/5UefRFaxVrF2E/Lg/mlJ5nXhTbdQ2JanWkITxXCxOwY08QbRmOzNlT5U7sFpEcC2FHorjuzOPWFhxhvEMffC6sekHHlx+Cxi0c/MUtJrko/ih0cTjBM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CxbLFNbT; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5FD3AC433F1; Tue, 16 Jan 2024 20:05:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1705435505; bh=ojpqifFXAOWVDVr6GK2OGx/tKy+EZa6kBJ78lRV0shg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CxbLFNbTzAI60/T0n01W+sFkMRj1G0Pg2T3Nr/yMzDjOw7CSMNDqkejgqpDH4o6+g qK/UFurpCzHnSFtNakDddmHVJ9jFJBqXuMm82doi+HGqH/oSIQziq2iorlJOGauTon ThLbxECQgHfgIUckjrm++deLhyRuwztoTwk3JoKSKaYCnZAMJxKyWCLdCiL/ZY6B9p POvg/II2ln29WdI2Ry4YZk1/Vx7nz/eRUnq0K9QvKYNDq9C9S6WeaeL76nrw9UajMi Jdg1ctUdBueldVbUenBua5zW6co55Z+CTK5+8rSGcc+aaWdkCspbDmNEeO92xFPZh3 aq6wZR+yPkmpw== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Alex Lyakas , Song Liu , Sasha Levin , linux-raid@vger.kernel.org Subject: [PATCH AUTOSEL 4.19 18/22] md: Whenassemble the array, consult the superblock of the freshest device Date: Tue, 16 Jan 2024 15:04:12 -0500 Message-ID: <20240116200432.260016-18-sashal@kernel.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240116200432.260016-1-sashal@kernel.org> References: <20240116200432.260016-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-stable-base: Linux 4.19.305 Content-Transfer-Encoding: 8bit From: Alex Lyakas [ Upstream commit dc1cc22ed58f11d58d8553c5ec5f11cbfc3e3039 ] Upon assembling the array, both kernel and mdadm allow the devices to have event counter difference of 1, and still consider them as up-to-date. However, a device whose event count is behind by 1, may in fact not be up-to-date, and array resync with such a device may cause data corruption. To avoid this, consult the superblock of the freshest device about the status of a device, whose event counter is behind by 1. Signed-off-by: Alex Lyakas Signed-off-by: Song Liu Link: https://lore.kernel.org/r/1702470271-16073-1-git-send-email-alex.lyakas@zadara.com Signed-off-by: Sasha Levin --- drivers/md/md.c | 54 ++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 44 insertions(+), 10 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 6b074c2202d5..3cc28b283607 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -1034,6 +1034,7 @@ struct super_type { struct md_rdev *refdev, int minor_version); int (*validate_super)(struct mddev *mddev, + struct md_rdev *freshest, struct md_rdev *rdev); void (*sync_super)(struct mddev *mddev, struct md_rdev *rdev); @@ -1160,8 +1161,9 @@ static int super_90_load(struct md_rdev *rdev, struct md_rdev *refdev, int minor /* * validate_super for 0.90.0 + * note: we are not using "freshest" for 0.9 superblock */ -static int super_90_validate(struct mddev *mddev, struct md_rdev *rdev) +static int super_90_validate(struct mddev *mddev, struct md_rdev *freshest, struct md_rdev *rdev) { mdp_disk_t *desc; mdp_super_t *sb = page_address(rdev->sb_page); @@ -1665,7 +1667,7 @@ static int super_1_load(struct md_rdev *rdev, struct md_rdev *refdev, int minor_ return ret; } -static int super_1_validate(struct mddev *mddev, struct md_rdev *rdev) +static int super_1_validate(struct mddev *mddev, struct md_rdev *freshest, struct md_rdev *rdev) { struct mdp_superblock_1 *sb = page_address(rdev->sb_page); __u64 ev1 = le64_to_cpu(sb->events); @@ -1761,13 +1763,15 @@ static int super_1_validate(struct mddev *mddev, struct md_rdev *rdev) } } else if (mddev->pers == NULL) { /* Insist of good event counter while assembling, except for - * spares (which don't need an event count) */ - ++ev1; + * spares (which don't need an event count). + * Similar to mdadm, we allow event counter difference of 1 + * from the freshest device. + */ if (rdev->desc_nr >= 0 && rdev->desc_nr < le32_to_cpu(sb->max_dev) && (le16_to_cpu(sb->dev_roles[rdev->desc_nr]) < MD_DISK_ROLE_MAX || le16_to_cpu(sb->dev_roles[rdev->desc_nr]) == MD_DISK_ROLE_JOURNAL)) - if (ev1 < mddev->events) + if (ev1 + 1 < mddev->events) return -EINVAL; } else if (mddev->bitmap) { /* If adding to array with a bitmap, then we can accept an @@ -1788,8 +1792,38 @@ static int super_1_validate(struct mddev *mddev, struct md_rdev *rdev) rdev->desc_nr >= le32_to_cpu(sb->max_dev)) { role = MD_DISK_ROLE_SPARE; rdev->desc_nr = -1; - } else + } else if (mddev->pers == NULL && freshest && ev1 < mddev->events) { + /* + * If we are assembling, and our event counter is smaller than the + * highest event counter, we cannot trust our superblock about the role. + * It could happen that our rdev was marked as Faulty, and all other + * superblocks were updated with +1 event counter. + * Then, before the next superblock update, which typically happens when + * remove_and_add_spares() removes the device from the array, there was + * a crash or reboot. + * If we allow current rdev without consulting the freshest superblock, + * we could cause data corruption. + * Note that in this case our event counter is smaller by 1 than the + * highest, otherwise, this rdev would not be allowed into array; + * both kernel and mdadm allow event counter difference of 1. + */ + struct mdp_superblock_1 *freshest_sb = page_address(freshest->sb_page); + u32 freshest_max_dev = le32_to_cpu(freshest_sb->max_dev); + + if (rdev->desc_nr >= freshest_max_dev) { + /* this is unexpected, better not proceed */ + pr_warn("md: %s: rdev[%pg]: desc_nr(%d) >= freshest(%pg)->sb->max_dev(%u)\n", + mdname(mddev), rdev->bdev, rdev->desc_nr, + freshest->bdev, freshest_max_dev); + return -EUCLEAN; + } + + role = le16_to_cpu(freshest_sb->dev_roles[rdev->desc_nr]); + pr_debug("md: %s: rdev[%pg]: role=%d(0x%x) according to freshest %pg\n", + mdname(mddev), rdev->bdev, role, role, freshest->bdev); + } else { role = le16_to_cpu(sb->dev_roles[rdev->desc_nr]); + } switch(role) { case MD_DISK_ROLE_SPARE: /* spare */ break; @@ -2691,7 +2725,7 @@ static int add_bound_rdev(struct md_rdev *rdev) * and should be added immediately. */ super_types[mddev->major_version]. - validate_super(mddev, rdev); + validate_super(mddev, NULL/*freshest*/, rdev); if (add_journal) mddev_suspend(mddev); err = mddev->pers->hot_add_disk(mddev, rdev); @@ -3593,7 +3627,7 @@ static void analyze_sbs(struct mddev *mddev) } super_types[mddev->major_version]. - validate_super(mddev, freshest); + validate_super(mddev, NULL/*freshest*/, freshest); i = 0; rdev_for_each_safe(rdev, tmp, mddev) { @@ -3608,7 +3642,7 @@ static void analyze_sbs(struct mddev *mddev) } if (rdev != freshest) { if (super_types[mddev->major_version]. - validate_super(mddev, rdev)) { + validate_super(mddev, freshest, rdev)) { pr_warn("md: kicking non-fresh %s from array!\n", bdevname(rdev->bdev,b)); md_kick_rdev_from_array(rdev); @@ -6453,7 +6487,7 @@ static int add_new_disk(struct mddev *mddev, mdu_disk_info_t *info) rdev->saved_raid_disk = rdev->raid_disk; } else super_types[mddev->major_version]. - validate_super(mddev, rdev); + validate_super(mddev, NULL/*freshest*/, rdev); if ((info->state & (1<raid_disk != info->raid_disk) { /* This was a hot-add request, but events doesn't -- 2.43.0