Received: by 2002:ab2:2994:0:b0:1ef:ca3e:3cd5 with SMTP id n20csp211868lqb; Thu, 14 Mar 2024 09:12:29 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUjaF6ztfoGRkHtY35EQRF7HrKFumZCg7LNwsXPGBvxkqyAniFDYbSjD3p5XXfIVLp6a8PAfQFaXgSoWgRNymEbt7CJRxx5FexX8IrjHw== X-Google-Smtp-Source: AGHT+IGgJm3amM+exxIOLvviZcTyhMhmSShV66p7jlOFn2bS2CPntTAsuqd+TBgd3BEW6oss37yT X-Received: by 2002:a2e:93c2:0:b0:2d2:4637:63f with SMTP id p2-20020a2e93c2000000b002d24637063fmr1297426ljh.45.1710432749377; Thu, 14 Mar 2024 09:12:29 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1710432749; cv=pass; d=google.com; s=arc-20160816; b=01IcAehIvtsCSanhb/cTIgOXfA4j1Ql+CkqjXwqrDXm/yokoimV1b33hazd9/s/iot g9Dad1hX+5a9fu6YL1685sxLqYJL9c4OYzy+gfxy8/ElFEzV3t80PF4moRM/MVUy+XbA slW0J70SXeYqGWnRd4TCnuWMFCfjjNCeNprZGkSyE1tVhQ031zioavAwCicgXdV6l9BI 8FClqu8FOvyWKZGGv63xD8bIWRy+n1L5HsXk19iG6XMKFg2S3L9NmqwcwBP91HquMv2j 3rvHLz40+34zPtPHSrP1i4vYJKWnriPQozmxVDM2gxfLnPx3UaC47REZmlpg2DTfrz9Z +TQA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=1luFzV2CPB2v/YqlRKWWzf4uwnz03g7Ye4zf7/eqxm4=; fh=NeO4Di6CrH96u/GOLjKM3NsRoDg8F1zF7HnwN55wxFc=; b=A3v3s+163E+Ap534StrGlanixIgC311sdASKPh56U51ZKn1q9t7O/vK4rwKggsCX+p We+Y8Rh3FIBr39cimCtGr8RQV1L2wzrdGvK98PYI4wFs4cdyq/XfZKFxRfKC+rFOohAU E2gArOKtbBfP4DuVTkywHBtuMqbb2waJmxoC+lLZ2j+C49jdYrXsoLt/kcJaujgcQBMt oFYclHcnuoBRt9UbE74Ktckgf5U5ZveDD5wzYRQ+g8R5F9O5VNqqkSzOhCGVvwQsoA7h W7cbkxMQMDBwS8eMuQKCwkGLUczbsnxuyr2SbKOuYsZVbvbIQDQ399SE6B3GakIYOkUX UfdA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@danm.net header.s=sig1 header.b=dLG7BlGV; arc=pass (i=1 spf=pass spfdomain=danm.net dkim=pass dkdomain=danm.net); spf=pass (google.com: domain of linux-kernel+bounces-103551-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-103551-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id m8-20020a056402430800b005689df617fesi951710edc.277.2024.03.14.09.12.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Mar 2024 09:12:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-103551-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@danm.net header.s=sig1 header.b=dLG7BlGV; arc=pass (i=1 spf=pass spfdomain=danm.net dkim=pass dkdomain=danm.net); spf=pass (google.com: domain of linux-kernel+bounces-103551-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-103551-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 16B9F1F223B5 for ; Thu, 14 Mar 2024 16:12:29 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 26E9D71B39; Thu, 14 Mar 2024 16:12:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=danm.net header.i=@danm.net header.b="dLG7BlGV" Received: from mr85p00im-zteg06021601.me.com (mr85p00im-zteg06021601.me.com [17.58.23.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A67C73505 for ; Thu, 14 Mar 2024 16:12:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=17.58.23.187 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710432738; cv=none; b=sO+97cm43TNJ90EJAyIeYWZbLRh68hh6uHvmai5h3SZGnM9BJK0glb7//hQbDHi3v+QEOnQXJCCQWJpU9F0qNKH9PC9AGXpFeyITfCnQ4+nNh/GTeKe6xKR+00mr3QkLhNYvVvijt76s5ZjFlNWc/FZjVDUrdgnhDNfoBHEy2rY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710432738; c=relaxed/simple; bh=PCA4oEGeHzergRYebAFkfnGjTh/er2Db/gz5H42uMWw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Qv0QlZ5gR/hB/nbjrGsZ8cNsPFvDnlc/R6gnDq73jhZcMNUdxrVKfoZ4oYYFCtRCjxZRmXTx9Hms/oiICazv93HS7Y5qQJ9/wibtHMzI/eAD0G7KBMhMjBYynawdRBXZEH24eSIk3iffRkOgSoRaoY+JHGvuZdW3q+GLwKP674Y= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=danm.net; spf=pass smtp.mailfrom=danm.net; dkim=pass (2048-bit key) header.d=danm.net header.i=@danm.net header.b=dLG7BlGV; arc=none smtp.client-ip=17.58.23.187 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=danm.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=danm.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danm.net; s=sig1; t=1710432735; bh=1luFzV2CPB2v/YqlRKWWzf4uwnz03g7Ye4zf7/eqxm4=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=dLG7BlGV8NylXoAI//Vb+4/4o+da8YCly/CXZCm2kgpQW7tfq2MFAtcCNGsIi7S7G J0SlTwS0CO2aaI5uoE1vE2iBbntQ4rZ4uByY6eMmLES8MZTnIk5x7dj79j+Q8U0KQn VnRmeiSHK/dgzy5BJsWAtFBDIZtUuiRF16O1UpFxlg7ciTtdpSyyxPIJ43nvwFd2Q/ gs/P6eGcwDh3Rmu8AGyBSRLj80ttCWRudfFYHVK8A79fOtlY0X6I7Sj693XypaC7Xb aPQ6peNdQaD7csfwKLXC3MCCl+UYReQ+TRKKqGFkRrC0/N/cMwyDJKGC/DeW5qQHpq XyLPvEz1An2Bg== Received: from hitch.danm.net (mr38p00im-dlb-asmtp-mailmevip.me.com [17.57.152.18]) by mr85p00im-zteg06021601.me.com (Postfix) with ESMTPSA id BDCA63058A81; Thu, 14 Mar 2024 16:12:13 +0000 (UTC) From: Dan Moulding To: yukuai1@huaweicloud.com Cc: dan@danm.net, gregkh@linuxfoundation.org, junxiao.bi@oracle.com, linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, regressions@lists.linux.dev, song@kernel.org, stable@vger.kernel.org, yukuai3@huawei.com Subject: Re: [REGRESSION] 6.7.1: md: raid5 hang and unresponsive system; successfully bisected Date: Thu, 14 Mar 2024 10:12:11 -0600 Message-ID: <20240314161211.14002-1-dan@danm.net> X-Mailer: git-send-email 2.43.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Proofpoint-ORIG-GUID: n00SGHrE7BkMgWyyYDU0ShHBWmWW7znb X-Proofpoint-GUID: n00SGHrE7BkMgWyyYDU0ShHBWmWW7znb X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-14_13,2024-03-13_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 bulkscore=0 mlxlogscore=984 adultscore=0 phishscore=0 malwarescore=0 suspectscore=0 mlxscore=0 clxscore=1030 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2308100000 definitions=main-2403140122 > How about the following patch? > > Thanks, > Kuai > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 3ad5f3c7f91e..0b2e6060f2c9 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -6720,7 +6720,6 @@ static void raid5d(struct md_thread *thread) > > md_check_recovery(mddev); > > - blk_start_plug(&plug); > handled = 0; > spin_lock_irq(&conf->device_lock); > while (1) { > @@ -6728,6 +6727,14 @@ static void raid5d(struct md_thread *thread) > int batch_size, released; > unsigned int offset; > > + /* > + * md_check_recovery() can't clear sb_flags, usually > because of > + * 'reconfig_mutex' can't be grabbed, wait for > mddev_unlock() to > + * wake up raid5d(). > + */ > + if (test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)) > + goto skip; > + > released = release_stripe_list(conf, > conf->temp_inactive_list); > if (released) > clear_bit(R5_DID_ALLOC, &conf->cache_state); > @@ -6766,8 +6773,8 @@ static void raid5d(struct md_thread *thread) > spin_lock_irq(&conf->device_lock); > } > } > +skip: > pr_debug("%d stripes handled\n", handled); > - > spin_unlock_irq(&conf->device_lock); > if (test_and_clear_bit(R5_ALLOC_MORE, &conf->cache_state) && > mutex_trylock(&conf->cache_size_mutex)) { > @@ -6779,6 +6786,7 @@ static void raid5d(struct md_thread *thread) > mutex_unlock(&conf->cache_size_mutex); > } > > + blk_start_plug(&plug); > flush_deferred_bios(conf); > > r5l_flush_stripe_to_raid(conf->log); I can confirm that this patch also works. I'm unable to reproduce the hang after applying this instead of the first patch provided by Junxiao. So looks like both ways are succesful in avoiding the hang. -- Dan