Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp2390172pxb; Fri, 17 Sep 2021 08:43:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyIwqzMxzL/bZQCAX2ocqC5713Dm0tBapbNCmp2TSgDxJk56U/I26VVEmq9G93AWZOF0YQT X-Received: by 2002:aa7:d2ce:: with SMTP id k14mr13355870edr.396.1631893411771; Fri, 17 Sep 2021 08:43:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631893411; cv=none; d=google.com; s=arc-20160816; b=fZHfCBx5PqEksPTpe6I/DEp0FNFlYN2l3v0pXZ3xEkRJ6isI4ri7UxxzDDSwZkkEGF TTSAn2/EpxPgPQnD5QPA7IrFKp2eGpU03IHGVwOc8FmQ2GbQOROAnru4V9FanA/rlSjo C8qK76bHzG3vG/oZWYberx/RRDLAm3ZzLglhzuW8NTUyfutRw7BCfMPVzkG5qg9ldsTd 3Gmkjzox8lj77qcwdzlf/3ke0kD7Yn/GtUgpcQPOunVmdPLyOESHd0cyEahk7pFcXQ7c zhQcFPJ0yS/jiL2W23t4mO935Pvhec7jzXPQY5aDu0lpBjPetwPO8giCW6gHQYyDePnj 4r8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=nQn08ECqbjRsfZOS/mRTVkhN5kOJ9eBfSKQQkS91uSM=; b=eH31S4p0vZlYn7YQf9AmuVUiEB7RRN3uI1JHRpurtR4BgGRVHSF5tP8x6GmtSZePRg iWLTMcShFy2KIVSxUmmnxwkFFlteHC19/E3Pv2EFcLe/je20LJyiPu9JpqvDxNR7+9jE ioLi7cpa/BmY6BcQLPrXR8a1GiggvkGijR/jykOQb0b7C3dcJI1mbOsiGCNFgg5bZ8uv ZNoPQqOW1copzeCFZ9fIYEM8xu2qFOU60/nJXAO1lTyesb1sIRLJwNJE08G0FGASdk3D tM5PxHmuqMZHLcoSx39xg12BYpV7ksHj1FokJXJCFuY/0qiqwjQPvRN8m/cwUW52yz49 jTrQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z3si7318005edr.316.2021.09.17.08.43.05; Fri, 17 Sep 2021 08:43:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344035AbhIQPW6 (ORCPT + 99 others); Fri, 17 Sep 2021 11:22:58 -0400 Received: from mga05.intel.com ([192.55.52.43]:52202 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231712AbhIQPWk (ORCPT ); Fri, 17 Sep 2021 11:22:40 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10110"; a="308361924" X-IronPort-AV: E=Sophos;i="5.85,301,1624345200"; d="scan'208";a="308361924" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Sep 2021 08:18:48 -0700 X-IronPort-AV: E=Sophos;i="5.85,301,1624345200"; d="scan'208";a="509948718" Received: from mtkaczyk-devel.igk.intel.com ([10.102.102.23]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Sep 2021 08:18:47 -0700 From: Mariusz Tkaczyk To: song@kernel.org Cc: linux-kernel@vger.kernel.org Subject: [PATCH 2/2] raid5: introduce MD_BROKEN Date: Fri, 17 Sep 2021 17:18:31 +0200 Message-Id: <20210917151831.3000-3-mariusz.tkaczyk@linux.intel.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210917151831.3000-1-mariusz.tkaczyk@linux.intel.com> References: <20210917151831.3000-1-mariusz.tkaczyk@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Raid456 module had allowed to achieve failed state, distinct from other redundant levels. It was fixed by fb73b357fb9 ("raid5: block failing device if raid will be failed"). This fix introduces a bug, now if raid5 fails during IO, it may result with a hung task without completion. Faulty flag on the device is necessary to process all requests and is checked many times, mainly in anaylze_stripe(). Allow to set faulty flag on drive again and set MD_BROKEN if raid is failed. Fixes: fb73b357fb9 ("raid5: block failing device if raid will be failed") Signed-off-by: Mariusz Tkaczyk --- drivers/md/raid5.c | 34 ++++++++++++++++------------------ 1 file changed, 16 insertions(+), 18 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 02ed53b20654..43e1ff43a222 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -690,6 +690,9 @@ static int has_failed(struct r5conf *conf) { int degraded; + if (test_bit(MD_BROKEN, &conf->mddev->flags)) + return 1; + if (conf->mddev->reshape_position == MaxSector) return conf->mddev->degraded > conf->max_degraded; @@ -2877,34 +2880,29 @@ static void raid5_error(struct mddev *mddev, struct md_rdev *rdev) unsigned long flags; pr_debug("raid456: error called\n"); - spin_lock_irqsave(&conf->device_lock, flags); - - if (test_bit(In_sync, &rdev->flags) && - mddev->degraded == conf->max_degraded) { - /* - * Don't allow to achieve failed state - * Don't try to recover this device - */ - conf->recovery_disabled = mddev->recovery_disabled; - spin_unlock_irqrestore(&conf->device_lock, flags); - return; - } + pr_crit("md/raid:%s: Disk failure on %s, disabling device.\n", + mdname(mddev), bdevname(rdev->bdev, b)); + spin_lock_irqsave(&conf->device_lock, flags); set_bit(Faulty, &rdev->flags); clear_bit(In_sync, &rdev->flags); mddev->degraded = raid5_calc_degraded(conf); + + if (has_failed(conf)) { + set_bit(MD_BROKEN, &mddev->flags); + conf->recovery_disabled = mddev->recovery_disabled; + pr_crit("md/raid:%s: Cannot continue on %d devices.\n", + mdname(mddev), conf->raid_disks - mddev->degraded); + } else + pr_crit("md/raid:%s: Operation continuing on %d devices.\n", + mdname(mddev), conf->raid_disks - mddev->degraded); + spin_unlock_irqrestore(&conf->device_lock, flags); set_bit(MD_RECOVERY_INTR, &mddev->recovery); set_bit(Blocked, &rdev->flags); set_mask_bits(&mddev->sb_flags, 0, BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING)); - pr_crit("md/raid:%s: Disk failure on %s, disabling device.\n" - "md/raid:%s: Operation continuing on %d devices.\n", - mdname(mddev), - bdevname(rdev->bdev, b), - mdname(mddev), - conf->raid_disks - mddev->degraded); r5c_update_on_rdev_error(mddev, rdev); } -- 2.26.2