Received: by 2002:ab2:3141:0:b0:1ed:23cc:44d1 with SMTP id i1csp964457lqg; Sat, 2 Mar 2024 08:55:55 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWwgFChe5hTC129GF4BOPxPK8zmsJL+/XDZe5WSKw3DCZRzR5VUj7RvBk1YLHXENaxSm+LarivU3eIjfUqJdgPOQYEyR7C3wNCJ9GrFwA== X-Google-Smtp-Source: AGHT+IFrqo6MGcUyXLi4FdH23dwMqtnE+x1HSZZG7snxlqB38AUtZoNPogYFO+xOiUqcYF/kA/ew X-Received: by 2002:a05:6e02:2163:b0:365:be3f:8414 with SMTP id s3-20020a056e02216300b00365be3f8414mr5412854ilv.4.1709398554807; Sat, 02 Mar 2024 08:55:54 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709398554; cv=pass; d=google.com; s=arc-20160816; b=u6yomN6PiuTseq/fEvlKIrdKn48VcoTD0AG2vznUmfv7Mx8cw6chCUapllG26XzCa+ gNQiPeeWTnEHBj2IiCbxZ9BNN5BjFjQ+bUXG5MBoVuRZwlYvY26cEUs/7J8fJHj38ZzS t3vHlwwRAptty7tVO7/KYuCW3iW7DgiH0ww/tFStJTSv6ShWXliJT/vPlOF7+m4qOFgM JkO4QMxZYJWw/wgYlXsq2bSEOky+57mCEAUOEFmI+YFxg22FpvNdzq9juyyflmLn/T36 IGM0JIBFFZ/j74tfZL/Pq9VCBdn9IJbevECZ7mMmC3kC/Fggd1+7FmdBl94ehvz8gzQZ JTxA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=FPJi0KKilosGRHeh4YPOA99JZNRJDTi/69AY3tYVgS0=; fh=8PcMJnXrNwInJ4w9yV5v5exUx7G/cXDVe7YV2xeGo+c=; b=pbCIzJgqVscn2j/IZXP7lUOts0o94niuVROpXibd3bqVM1PG2AfBKumFLf4XQxGitV hQCsUnVvBaIIIoIxOiAxPk+r+70FO+0Azzv7zNf6lokVbz8e/L+X0TS65JZZqQDaQDUt VTZE3iDShfYBleZRU8d2SjHIfdL6iSQQzPCySUlTsKbbvmAqKKJcBJB2qVcbSd+iUo62 u4T7eM/3E3h1neVBFJ5lguZX8JRnhE62OUF7lDDWxxeHcblADO9DEdSVvx9X3Vyhzvc8 LanJS67Qo+rLGOOXGUnVDX4BlOYSQiz/J/iyG3YkibqTxPGNrhJPG0qMWDYGVr9y4ge2 FmCg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@danm.net header.s=sig1 header.b=TDf4a8ID; arc=pass (i=1 spf=pass spfdomain=danm.net dkim=pass dkdomain=danm.net); spf=pass (google.com: domain of linux-kernel+bounces-89521-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-89521-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id 22-20020a631656000000b005dc8834047esi5466287pgw.422.2024.03.02.08.55.54 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 02 Mar 2024 08:55:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-89521-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@danm.net header.s=sig1 header.b=TDf4a8ID; arc=pass (i=1 spf=pass spfdomain=danm.net dkim=pass dkdomain=danm.net); spf=pass (google.com: domain of linux-kernel+bounces-89521-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-89521-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 724A228338A for ; Sat, 2 Mar 2024 16:55:54 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D17462BAFD; Sat, 2 Mar 2024 16:55:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=danm.net header.i=@danm.net header.b="TDf4a8ID" Received: from mr85p00im-zteg06011601.me.com (mr85p00im-zteg06011601.me.com [17.58.23.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 433F52BAEF for ; Sat, 2 Mar 2024 16:55:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=17.58.23.186 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709398544; cv=none; b=u/FiVZrY5wMD+8GVkdt9x0FUyByxP/o7EDnDcV52D8BGSlWKFr0G+K6FdiSILNVYfMrd5Y1wsSYOMwUqZSgE0k+aWVLWSmA2h1gd86KnEYnUbrIVOo3gaudPiXPtaW+DV/Fc1vrVY6H4Mc2iVNPFPQne5ehS9wYQSe8/5tJTu/Y= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709398544; c=relaxed/simple; bh=r1hdEvOvTGAyYI2NDiZwnAMIDbGJLBZ33WTBhZkrnjM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YtFahOD0lX6sqUh2wSnZY7Uk+Ac0GfrIXxgAzV69EmBWoR5p/vAHQUnBC8zQIp7p6DFUzfKQzxkLbCm1GW2BGmG5ZRiCGoELM8tUpeLZGP4YN2RKe9GcS2LM73uRG1FBbyVwYFqyAkJ/35ckD78Wwr3Wqk5sX/bR6ULS4ckxwjg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=danm.net; spf=pass smtp.mailfrom=danm.net; dkim=pass (2048-bit key) header.d=danm.net header.i=@danm.net header.b=TDf4a8ID; arc=none smtp.client-ip=17.58.23.186 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=danm.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=danm.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danm.net; s=sig1; t=1709398542; bh=FPJi0KKilosGRHeh4YPOA99JZNRJDTi/69AY3tYVgS0=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=TDf4a8IDxq2gQlVap0Wk3kfwJ5WcxRYxP4IPhPWORKWLVepQU7Oha3FduDhdHXaOe 6XGVhxkvA7NZFH4wYhKYJeLTZ6/wRHYoQvwrwExdTQVDZXhH1ND9IQ5SreiOWcqkXZ G5rLpZuhxYRUmdxHng7YJvsAwKOD2ZP/hhaf5wYRUt3et4vL/LHFGUpXAkEpJ5YkdA gs2jGor9O7vl22du6hmgoMuwj4Cn80kIPBTNhZZZPI3ch4isRse7Gp6FVtpdMqD6yY /FOvRKumMMMKb67fzkSjngAbzaAB01pm/iut6AaJ+Q5ROmvIW0oDO/wJuNhG14xRyP ztC/a3GncVqGA== Received: from hitch.danm.net (mr38p00im-dlb-asmtp-mailmevip.me.com [17.57.152.18]) by mr85p00im-zteg06011601.me.com (Postfix) with ESMTPSA id A175B180384; Sat, 2 Mar 2024 16:55:40 +0000 (UTC) From: Dan Moulding To: junxiao.bi@oracle.com Cc: dan@danm.net, gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, regressions@lists.linux.dev, song@kernel.org, stable@vger.kernel.org, logang@deltatee.com Subject: Re: [REGRESSION] 6.7.1: md: raid5 hang and unresponsive system; successfully bisected Date: Sat, 2 Mar 2024 09:55:38 -0700 Message-ID: <20240302165538.30761-1-dan@danm.net> X-Mailer: git-send-email 2.43.0 In-Reply-To: <739634c3-3e21-44dd-abb1-356cf54e54fd@oracle.com> References: <739634c3-3e21-44dd-abb1-356cf54e54fd@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Proofpoint-ORIG-GUID: JofO4N5HXtQWGery86c83noqRzdmhkBP X-Proofpoint-GUID: JofO4N5HXtQWGery86c83noqRzdmhkBP X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-02_04,2024-03-01_03,2023-05-22_02 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 adultscore=0 malwarescore=0 mlxlogscore=515 bulkscore=0 clxscore=1030 mlxscore=0 suspectscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2308100000 definitions=main-2403020146 > I have not root cause this yet, but would like share some findings from > the vmcore Dan shared. From what i can see, this doesn't look like a md > issue, but something wrong with block layer or below. Below is one other thing I found that might be of interest. This is from the original email thread [1] that was linked to in the original issue from 2022, which the change in question reverts: On 2022-09-02 17:46, Logan Gunthorpe wrote: > I've made some progress on this nasty bug. I've got far enough to know it's not > related to the blk-wbt or the block layer. > > Turns out a bunch of bios are stuck queued in a blk_plug in the md_raid5 > thread while that thread appears to be stuck in an infinite loop (so it never > schedules or does anything to flush the plug). > > I'm still debugging to try and find out the root cause of that infinite loop, > but I just wanted to send an update that the previous place I was stuck at > was not correct. > > Logan This certainly sounds like it has some similarities to what we are seeing when that change is reverted. The md0_raid5 thread appears to be in an infinite loop, consuming 100% CPU, but not actually doing any work. -- Dan [1] https://lore.kernel.org/r/7f3b87b6-b52a-f737-51d7-a4eec5c44112@deltatee.com